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FOREWORD 


A wumper oF THE CHAPTERS of this issue were prepared under some- 
what difficult circumstances by workers who are devoting part or all their 
time to war activities. 

This is the third issue of the Review to be devoted exclusively to re- 
search techniques and methods. E. F. Lindquist was invited to assume 
the chairmanship of the committee for this issue, but when he reported 
that he was giving all his available time to a new testing program and to 
war duties, the Editorial Board decided to assume responsibility for the 
planning and administration of the issue as a matter of experiencing at 
firsthand the work which committees are called upon to do. This is a situa- 
tion somewhat similar to that surrounding the first issue on this topic when 
Frank N. Freeman, then chairman of the Editorial Board, assumed the 
chairmanship of the special issue of February 1934 on “Methods and 
Technics of Educational Research.” That number was published at the 
end of the first three years of existence of the Review. The second issue 
devoted to “Methods of Research in Education,” December 1939, did not 
appear till the end of the ninth year of the Review’s life. The second cycle 
of the Review, in other words, did not have any volume summarizing spe- 
cifically the advances and practices in research methods for that period. 
There were, however, several chapters which occurred during that period 
on research methods, distributed among various issues of the REVIEW. 
These were detailed in the Foreword to the December 1939 issue. 

The present issue comes at the end of the fourth cycle of the Review 
and is concerned with research methods of the past three years. Whereas 
the December 1939 issue, under the chairmanship of Carter V. Good. was 
organized into nineteen chapters, the present issue condenses the outline 
into a smaller number of larger areas. This was done primarily in an effort 
to reduce the length of the issue. The topic of “appraisal” has been added 
at the direction of the Editorial Board. As is usually the case with the 
Review, much good material has had to be sacrificed to keep the length 
of the issue within bounds, and contributors have foregone the inclusion of 
many references in their bibliographies which an unlimited printing budget 
would have permitted. 

This issue of the Review concludes the first full experience with the new 
schedule of topics which was adopted by the Editorial Board at its meet- 
ings in 1938 and 1939. This schedule involved the experiment of printing 
five issues dealing with particular subjectmatter fields or areas, rather than 
having these several fields treated under an organization which separated 
each one of them into elementary-school methods, high-school methods, 
psychology, measurement, and curriculum. The first three cycles of the 
Review followed the earlier pattern with only minor changes. 

The experiment with the new list of topics has seemed, at least for the 
present, to be satisfactory and the schedule is being continued for the next 
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cycle in substantially the same form. An effort, however, will be made to 
avoid the double issue which occurred in October 1941. Accordingly, fine 
arts will be combined with the language arts and appear as the April 1943 
issue. A new title, “Education for Work and Citizenship,” will replace the 
title “Social Studies” and will be somewhat broader. Special attention 
to research on the war and education will be given in the February 1943 
issue, which is devoted to “The Social Background of Education”; and later 
issues also will deal with the war as it impinges upon their several areas. 


Douc.as E. Scates 
Chairman of the Editorial Board 
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CHAPTER I 


Bibliographical and Documentary Techniques in 
Education, Psychology, and Social Science 


CARTER V. GOOD 


Ths CHAPTER is concerned with recent literature on research procedures 
which depend primarily on the utilization of documentary sources. The 
topics treated include (a) library guides and tools in education, psy- 
chology, and other social sciences; (b) historiography and legal research; 
and (c) documentary reproduction. 


New Library Guides and Tools in Education 


Limitations of space permit the listing of only the recently issued guides 
most helpful to workers. For additional and more detailed information, 
standard full-size manuals (4, 139, 140, 191, 230) are available. Most not- 
able is Alexander’s revision of his general treatise on library aids and pro- 
cedures in education (4) providing a recent comprehensive presentation. 
Thorpe’s brief guide (207) is useful, although the attempt to cover in an- 
notated fashion the sources for educational research in only 24 pages 
has resulted in omissions and in some inaccuracies. The guide by Williams 
and Stevenson (228) is pointed primarily toward the needs of under- 
graduates but should prove helpful to many graduate students. Extensive 
individual bibliographies and major summaries of research are treated 
only in terms of the guides for locating them. Most of the guides which 
were regularly published before 1936 were summarized earlier by this 
writer (87) and are not reviewed again here. It should be noted in pass- 
ing, however, that the chief serial guides at that time are still the principal 
tools for library work in education, notably, the Education Index, the 
annual U. S. Office of Education Bibliography of Research Studies in Edu- 
cation (90), the bibliographies in the Elementary School Journal and the 
School Review (which were assembled annually and published as a mono- 
graph up to the 1938 references) , Education Abstracts, and the REVIEW OF 
EDUCATIONAL RESEARCH itself. The Readers’ Guide to Periodical Literature 
and the [nternational Index to Periodicals continue to render supplemen- 
tary service. 

A valuable tool for canvassing educational research literature in general 
and for obtaining overview treatments of various problems is the Encyclo- 
pedia of Educational Research (138) directed by W. S. Monroe and spon- 
sored by the American Educational Research Association. There are some 
gaps in the topics covered. The REVIEW oF EpuUCATIONAL RESEARCH, also 
sponsored by the Research Association, is a major guide, providing over- 
views and bibliographies of research in problem areas. As a rule, it has 
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included fifteen major subdivisions of education within a three-year cycle 
(as listed recently on the inside back cover). A twelve-year index of the 
contents of the REVIEW is now in process of preparation. The Encyclopedia 
of the Social Sciences (189) contains some material on education, as do 
most of the guides cited in the following sections on psychology and other 
social sciences. 

A comprehensive guide from 1910 or earlier to 1935 is the Monroe and 
Shores catalog of more than 4,000 annotated bibliographies and summaries 
listed under author and subject in one alphabet. This listing has been kept 
up to date since 1935 in part by the Education Index and also by the 
Bibliographic Index (21). This latter is a quarterly bibliography of bibli- 
ographies on a wide range of subjects, including various educational heads. 
The first number was published in March 1938. A recent, brief bibliogra- 
phy of educational bibliographies was prepared by Brickman (28). The 
Dictionary of Education (3, 82) will be helpful as a means of orientation, 
especially in clarifying concepts. 


Guides to periodicals and serials—‘Serials” may be defined as “any 
publication issued serially or in successive parts more or less regularly” 
(191). Most of the tools already cited relate to periodicals or serials. A 
number of general lists of serials, not necessarily educational, should be 
noted. The Ayer list (15) is a bibliography of newspapers and periodicals 
but includes much additional information. The Gregory union list of 
serials (92) shows the extent to which more than 75,000 different serials 
are found in 225 of the most important libraries in the United States and 
Canada. Ulrich’s list of 10,200 titles (208) represents the periodicals pub- 
lished in the United States and in foreign countries, especially those of 
England, France, and Germany, that have proved most useful in American 
collections. Lyle (131) has grouped his classified list of periodicals for the 
college library by academic fields of study. For more detailed information 
concerning use of the guides to serials, the reader is referred to selected 
sources (2, 4, 14, 234). These references contain tabulations of the years 
covered by the various indexes to the literature. 


Guides to educational books and monographs—The Education Index is 
still the most useful guide to books and monographs in education. School 
and Society has continued its publication of an annual classified list of 
educational books, monographs, yearbooks, and bulletins, with a selected 
list of sixty books marked with an asterisk (221). The titles of the sixty 
books of the year also appear in the April number of the Journal of the 
National Education Association (114). The United States Catalog is kept 
up to date by the monthly Cumulative Book Index, which cumulates at 
irregular intervals during the year, annually into a supplement, and after 
several years into a large supplement. Each entry includes information 
concerning author, title, edition, date, publisher, price, and paging. In a 
sense, the Publishers’ Weekly supplements the monthly Cumulative Book 
Index, in that it describes and indexes new books in a convenient reference 
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and buying list. Selected bibliographies of books, such as the A.L.A. 
Catalog and supplements, are helpful. For volumes later than 1936, The 
Booklist provides semimonthly selections and evaluations of books. 
The Book Review Digest offers guidance in the evaluation of some 4,000 
books during the course of a year. Detailed bibliographical references for 
these publications have been given earlier (87) and will not be repeated. 
Bridges (29) listed two hundred series of professional books, bulletins, and 
monographs in education published since 1900. 


Guides to graduate theses—The comprehensive guide to masters’ and 
doctors’ theses in education continues to be the annual Bibliography of Re- 
search Studies in Education (90) of the U. S. Office of Education. Begin- 
ning with the titles of 1912, the Library of Congress printed an annual 
volume of published doctors’ theses in all fields; this ceased publication 
with the theses of 1938 (211). Because this publication was usually two 
years late and omitted unpublished theses, another agency began a complete 
annual listing (61) with the titles of 1933-34, and since the year 1938, has 
been alone in carrying forward this work. Doctors’ theses under way in edu- 
cation have been listed annually, beginning with 1931, in the January 
Journal of Educational Research (83). Reference may be made to an annual 
summary (85) of doctors’ theses in school law, with a list of masters’ theses 
in the same field, and to an annual listing (65) of the recipients of doctors’ 
degrees in modern foreign languages. 

Many institutions now publish abstract volumes or lists of their theses, 
usually representing all the graduate work of the particular university but 
sometimes devoted to summaries or lists of the theses in education alone. 
A basic guide (156) to such summaries and lists of theses is available. An 
older source (60) also may prove helpful. The annual Bibliography of 
Research Studies in Education (90) lists such institutional summaries 
under the heading of “Research, educational—reports.” The Education 
Index offers similar guidance under the topics of: “Dissertations, aca- 
demic”; “Abstracts, educational”; “Degrees, academic”; “Degrees, doc- 
tors’; and “Degrees, masters’.” 


Special educational areas and problems—In addition to the guides de- 
scribed above and the serial bibliographies in the following paragraph. 
special aids prepared to facilitate canvassing of the literature of a limited 
educational area and published since 1935 are school administration (35). 
school law (59, 99), teacher training (126), adult education (19), testing 
(195), philosophy (161), business education (93, 215), physical and 
health education (78), handicapped children (127), nursing education 
(97), industrial arts (89), rural education (53), Negro education (167), 
Boy Scouts (133), and publications of the U. S. Office of Education (214). 
Yearbooks dealing with mental measurements (36, 37, 38, 39, 40), and 
with research and statistical methodology (41, 42), enable the student and 
research worker to keep in touch with current developments in these fields. 
Dictionaries of statistical terms (123), measurement and guidance (194), 
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occupational titles (210), and philosophy (16, 181) provide orientation 
for interpretation of the concepts represented. 

Serial bibliographies and summaries in limited areas of education— 
Many of the previously described comprehensive guides to the literature 
are in effect continuing bibliographies. There are also serial bibliographies 
or summaries of research for a number of specifically limited subdivisions 
of education. Most such references are annual in their appearance and have 
continued over a number of years. As a rule, the bibliography of summary 
for a particular year refers the reader to the earlier numbers in the series. 
Selected topics represented in the bibliography of this chapter are educa- 
tional books of the year (222), major educational projects and large-scale 
investigations (84, 188), deliberative committee reports (43), methods of 
research (99), teacher supply and demand (71), junior college (70), 
courses of study and curriculum materials (32), curriculum making (125), 
reading (91), science teaching (58), modern language teaching (205), 
physical education (2), and Negro education (119, 120, 160). 


Bibliographical, institutional, organization, and statistical directories in 
education—A number of handbooks of information and directories include 
biographical facts concerning individuals or statistical or personnel data 
for institutions. Among the overlapping illustrative references of this type 
are those dealing with leaders in education (46, 225), specialists in philoso- 
phy (226), college and university presidents (171), public and private 
schools (158), universities and colleges (132), special resources in 765 
libraries (218), school supplies and equipment (187), educational build- 
ings and grounds (10), individual professional organizations (5, 33, 190, 
223), and registration statistics of higher education (218). The most 
widely used educational directory (212) probably is that issued annually 
in four parts by the U. S. Office of Education: I. State and County School 
Officers, II. City School Officers, II. Colleges and Universities, and IV. 
Educational Associations and Directors. The educational and social direc- 
tories and yearbooks for 1942 listed in Part 1V of this Office of Education 
publication (213) number eighty-one. If publications are issued by the 
educational and learned associations, such journals, yearbooks, or proceed- 
ings are named. The general biographical directories—Who’s Who in 
America, Who's Who, and the International Who’s Who (109)—include a 
considerable number of educators. The Directory of American Scholars 
(44) includes a large number of professors, though not mainly in depart- 
ments of education. 





Guides to Psychological Literature and Data 


Many of the educational guides cover a considerable amount of psycho- 
logical research. In fact, there are certain areas where it is difficult, if not 
impossible, to draw a sharp line between the two disciplines; for example, 
learning and conditioning, personality and character, vocational guidance, 
mental tests, or childhood and adolescence. Therefore, for selected topics 
the student of psychology may find pertinent information in the previously 
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described educational guides, including the Encyclopedia of Educational 
Research, Education Index, and Review or EpUCATIONAL RESEARCH. 
There may be times when the student may find it desirable to use the 
guides for all the several areas (education, psychology, and social science ) 
discussed in this chapter in working out a single problem. 

The major comprehensive guide to the literature of psychology is the 
monthly journal Psychological Abstracts (172) founded in 1927. An 
author and subject index to the abstracts printed during each year is issued 
as an extra number each December. The Psychological Index (174), estab- 
lished in 1895, suspended publication in 1936. From 1927 to 1936 the 
two journals performed an overlapping service. As the titles of the publi- 
cations indicate, one includes abstracts or brief summaries while the other 
is merely an index or list of references. For publications prior to 1927 the 
Psychological Index is the only major comprehensive guide available. 
Both of these publications cover periodical literature, books, monographs, 
and published theses. 

A list of topics in psychology should prove useful in identifying appro- 
priate headings for canvassing psychological materials (95). A handbook 
of the literature of psychology (130) and a compilation of available bibli- 
ographies (129) are valuable for the periods represented. In certain large 
areas of psychology extensive summaries and guides have been provided: 
general experimental psychology (144), social psychology (145), and child 
psychology (143). The monthly Psychological Bulletin (17 3) usually 
publishes one or more critical surveys of the literature dealing with a spe- 
cific psychological problem. Psychology is well equipped with dictionaries 
(16, 74, 75, 105, 220), which perform an orientation function in the 
interpretation of psychological terms, concepts, principles, and procedures. 


Biographical directories in psychology—Two volumes of the Psycho- 
logical Register (147), a biographical and bibliographical directory of 
American and foreign psychologists, appeared in 1929 and 1932, respec- 
tively. The 1932 volume included 2.400 psychologists from forty countries. 
A projected first volume is to include psychologists who had died before the 
inauguration of the series, extending back as far as the Greek scholars. 
For more recent information, the yearbook of the American Psychological 
Association may be consulted, although this annual publication includes 
only the name, training, position, field of instruction, and major research 
interests of each individual. The 1942 yearbook lists 713 members and 
2,518 associates. American Men of Science (45) contains the biographies 
of a number of the more eminent psychologists. Many of the previously 
listed educational and general directories include information concerning 
certain psychologists, especially those engaged in teaching educational 
psychology or serving in administrative positions. The three-volume His- 
tory of Psychology in Autobiography (146) consists of extended résumés 
of the lives and works of selected psychologists, most of whom are living. 
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Guides to Literature and Data in Other Social Sciences 


Certain of the general educational guides described earlier in this chap- 
ter contain considerable material of interest to workers in other areas of 
the social sciences. This statement is especially applicable to the Encyclo- 
pedia of Educational Research, the Education Index, the REVIEW oF Epuca- 
TIONAL RESEARCH, the Bibliographic Index, and also the annual methodo- 
logical summary in the September issue of the Journal of Educational 
Research. 

The basic reference tool for the social sciences in general is the fifteen- 
volume Encyclopedia of the Social Sciences (189), covering the fields of 
anthropology, economics, education, history, law. philosophy, political 
science, psychology, social work, sociology, and statistics. Its purpose is to 
provide a synopsis of progress in these areas and a repository of facts and 
principles. It includes biographical articles and bibliographies. 

The London Bibliography of the Social Sciences (128) is a valuable com- 
pilation of some 6,000,000 entries, arranged alphabetically by subject with 
an author index, and based on the holdings of nine London libraries. 
Public Affairs Information Service (175) is a comprehensive index of 
periodicals, books, pamphlets, and other materials, particularly those with 
emphasis on sociology, economics, and political science. It is published 
weekly and cumulates five times a year and annually. Social Science 
Abstracts promised to solve the indexing and abstracting problems of the 
social science subjects but could finance itself for only four years, 1929-32. 
The result is four volumes plus an index. Much of the pamphlet material 
indexed in the Vertical File Service Catalog (216) is pertinent to the social 
science fields. Much social science material is indexed in the general peri- 
odical guides. 

The titles of certain journals dealing with the social science fields may 
be located in the sources described in the section on guides in education. 
The index volume of Social Science Abstracts contains a long list of the 
journals represented. The yearbook of the Educational Press Association 
of America includes the titles of the social science journals of most interest 
to workers in education. 

Other useful tools in the social sciences are a compilation (115) of re- 
search guides and references and a bibliography (57) on methods of re- 
search. The comprehensive guides to published books and to theses 
described in the section on education may be used for canvassing such 
materials in the social sciences. In addition, certain continuing or serial 
guides to theses in sociology and in history are published annually: doc- 
tors’ and masters’ studies under way in sociology (8), graduate degrees ' 
conferred in sociology (8a), and doctorate dissertations in progress in 
history (6). 


Guides to special areas and problems of the social sciences—Especially 
extensive guides have been prepared for certain subdivisions of the social 
sciences: bibliographies in history (20, 55, 67, 107), dictionaries of Ameri- 
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ean biography (110) and of American history (1), and a guide to ma- 
terials in political science (34). Dictionaries of terms in sociology (157) 
and in social work (240) are available. The publications of the Univer- 
sity of Chicago faculty in sociology were listed by Wirth (231). 

The annual census of social research (24) conducted by the American 
Sociological Society uses the subheadings of social psychology, history 
and theory, methods of research, social statistics, social biology, sociology 
and psychiatry, human ecology, rural sociology, educational sociology, 
community problems, sociology and social work, the family, sociology of 
religion, criminology, and political sociology. 


Social science directories and yearbooks—The social sciences are well 
equipped with directories and statistical and current events yearbooks. 
The annual educational directory published by the Office of Education in- 
cludes a useful list of educational and social directories and yearbooks, as 
well as a compilation of educational. civic, and learned associations in the 
United States. A more extensive handbook (149) lists the scientific and 
technical organizations of both this country and Canada. Two surveys (80 
153) of organized research in the social sciences are now out-of-date but 
may prove useful for certain purposes. 

There are directories of social work agencies (193), political leaders 
and programs (163), and municipal officers and activities (141, 142). 
The social sciences are well represented in the new Biographical Directory 
of American Scholars (44). Four of the widely used annual handbooks 
of information are the World Almanac (236), Statesman’s Yearbook 
(197), American Yearbook (11), and Statistical Abstract of the United 
States (209). Four of the better known encyclopedias publish annual sup- 
plements (12, 30, 151, 237). 

The biographical directories listed earlier in this chapter include many 
of the leading workers in the various social science fields. The Encyclo- 
pedia of the Social Sciences and the Dictionary of American Biography 
contain much biographical information concerning the workers in these 


fields. 


Developments in Historiography 


Education, like most fields of research, has depended primarily on pub- 
lications in history for knowledge concerning the historical method of re- 
search, including techniques for exploration of sources, criticism of docu- 
ments, and interpretation. The guides to the historical literature have been 
listed in the preceding section concerned with social science guides. The 
purpose of this discussion of historiography is to review briefly the major 
writings on the historical method as a research approach, rather than to 
deal with studies of the content of history as such. Research in the history 
of education and in comparative education has been summarized in other 
numbers of the Review (50, 68), while the historiography of three years 
ago was characterized briefly in the December 1939 issue of the Review 
(88). 
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The two-volume work by Thompson (206), published in the fall of 
1942, is the most comprehensive review of historical writing in print, 
covering the period from earliest antiquity to the outbreak of the First 
World War, although no living historian is included and by intent no 
American writer is mentioned. In surveying the changing conceptions of 
history and the various fashions of writing it, Thompson fitted each author 
into the general intellectual background of the age represented and assigned 
to each writer his place in the development of the contemporary historiog- 
raphy. Barnes, in a single volume (17), appears to have been the first to 
attempt a history of historical writing for substantially the entire period of 
recorded knowledge, with the result that parts of his book read like a roll 
call of historians or a running bibliography. However, Barnes’s plan of 
characterizing the intellectual background of each major period, of show- 
ing how the historical literature of each era is related to its parent culture, 
of indicating the dominant traits of such historical writing, and of identify- 
ing the individual contributions of the chief writers of each period has defi- 
nite advantages by way of synthesis over individual literary essays on a 
group of historians or over any encyclopedic bibliography of historical 
writing. Shotwell (192) dealt with early records and evaluated in detail 
the contributions of Jewish, Greek, Roman, and Christian historical writ- 
ings. Shotwell has planned a second volume to cover the period from early 
Christian times to the present. 

A number of other histories of history cover more limited areas. Kraus 
(121) provided the first survey of the whole field of American historical 
writing. Three recent volumes of essays in historiography, written by 
former students of particular institutions, are significant: two series in 
American historiography (81, 106) and one dealing with selected historians 
of modern Europe (186) 

Among the recent manuals directed chiefly toward the needs of beginners 
in historical writing, Nevins’ Gateway to History (150) is especially com- 
prehensive, with a multitude of interesting illustrations drawn largely 
from American history and biography. Kent (117) offered advice to the 
undergraduate senior and beginning graduate student in history in an 
attractive, forceful style. Good’s briefer analysis (86) of selected problems 
of historical criticism, interpretation, and writing synthesized points of 
view and examples from a considerable number of full-size treatises in 
historiography. Other fairly general discussions of the writing of history 
are those by Hulme (104), Kellett (116), Oman (154), and Taylor (203). 

Varied approaches, philosophies, and interpretative concepts in his- 
torical writing are represented in the literature of historiography during 
recent years. Among the more comprehensive treatments are discussions 
of history or the historical method in relation to biography (111), culture 
(185, 219), economic forces (98, 124), liberty (56), materialism (77, 94), 
science (135, 182), social science (168), and theory and philosophy (155, 
205). 
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Space permits mention of only the more extensive applications of the 
historical method during approximately the past decade to psychology 
and social science. Dunlap (66) characterized a large number of historical 
treatises (old and new) in psychology under the headings of topical 
surveys, surveys of periods, and expositions of the views of particular 
men or groups, source books, biography, and general histories of psychol- 
ogy. Selected, comprehensive histories of psychology are those by Boring 
(23), Flugel (79), Murchison (146), Murphy (148), and Spearman 
(196). Within the past few years histories have been written for several 
areas of the social sciences: anthropology (159), economics (112), social 
thought (18, 73), and sociology (103). 

This brief review of the literature of historiography suggests the pos- 
sibility of yet another step in historical writing—‘a history of the histories 
of historical writing.” In the course of human affairs events have trans- 
pired, then records of events have appeared, next the history based on 
such documents, considerably later the discussions of historical method or 
historiography, and recently the histories of historical writing. 


Summaries of Legal Research 


Legal research in education is a special application of the historical 
method. The documentary sources utilized are (a) statutory law (consti- 
tutional provisions and legislative or statutory enactments) and (b) case 
or common law (principles applied by the courts in deciding issues not 
covered by statutory law). Legal research shares with the historical method 
in general similar techniques for exploration of the sources, criticism of 
documents, and interpretation. 

During the past three years, 1940-41-42, the annual Yearbook of School 
Law (54) has continued as the outstanding publication in this field, with 
its contents devoted primarily to a narrative topical summary of decisions 
of the higher courts in all states of the United States in cases involving 
school law, as reported during the preceding year. This yearbook also 
summarizes the current doctors’ theses in school law and lists the masters’ 
theses in this area. Chambers (48) has tabulated the frequencies of 
doctors’ theses on legal aspects of education in terms of the thirty-three 
sponsoring institutions and by years from 1919 to 1939 inclusive. 

The major textbook publication in educational law for the three-year 
period is the volume by Hamilton and Mort (96), which is a combined 
textbook and casebook directed especially toward the problems of educa- 
tional administration. Chambers (47) has extended the 1936 volume, The 
Colleges and the Courts (72), by reviewing the judicial decisions regard- 
ing higher education in the United States from 1936 through 1940. 

During recent years the policy of the REView oF EDUCATIONAL RESEARCH 
has been to discuss the legal aspects of an educational problem in the 
issue where other phases of the same problem are considered, rather than 
to devote a separate number of the journal exclusively to school law. 
Within the past three years parts of certain numbers of the Review have 
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dealt with the legal phases of school organization and administration 
(179), finance and business administration (49), planning and construct- 
ing school buildings (31), the status of teachers (221), and research 
literature (53). 


Documentary Reproduction 


Microphotography within the past decade has become the major method 
of reproducing research materials and has brought about for the historian 
a revolution in methods of work second in importance only to the develop- 
ment of printing (76). The recency of the development of the several 
techniques of documentary reproduction, especially microphotography or 
microcopying, is indicated by the appearance in 1936 of Binkley’s manual 
(22), which is the first detailed account of the various modern methods 
of reproducing research materials. In a sense, this manual is the parent 
of the Journal of Documentary Reproduction (113), since one reason for 
founding the journal in 1938 was to continue and keep up to date Binkley’s 
pioneer work. Parts of the manual have been superseded by later and 
briefer publications but it is still the only basic reference volume for the 
subjects covered. Developments of 1936 and 1937 are covered in the two 
volumes (176, 177) that report the papers presented as microphotography 
symposia at the 1936 and 1937 conferences of the American Library 
Association. 

There are four major uses (27, 122, 170, 201) of microphotography 
in furthering the work of scholars and research workers: 

1. Negatives of materials not available in local libraries or in any single library 
may be made in many distant places and combined into a complete collection or 
series; for example, a newspaper or journal series or some other periodical collection. 

2. Materials from distant libraries that cannot be visited, particularly collections 
in foreign countries, can be reproduced for local use. 

3. Original publication of research, such as doctoral dissertations, is possible. 
University Microfilms publishes at intervals a volume of abstracts (134) of doctoral 
theses that are available in complete form on microfilm. Power (169) has outlined 
the problems and procedures involved in publication of theses by microfilm. Paul 
Monroe’s work (136, 137) on the history of education combines the process of 
printing (Volume I, the textbook) and microfilm (Volume II, a collection of 
readings from the documents and source materials referred to in the textbook). 

4. Preservation of documents that face disintegration, destruction, or damage 
through the ravages of time, wear, fire, or war. The space saved in the case of bulky 


materials, such as newspapers, is considerable; a filmed volume on a newspaper occupies 
approximately one-fiftieth of the space of the original. 


The rapidity with which the work of microcopying has progressed is 
evidenced by the new Union List of Microfilms (160), which includes 
5,221 items. The editorial committee in charge of this list hopes to issue 
annual supplements to keep the compilation up to date. Descriptions of the 
microcopying projects of individual libraries or organizations are avail- 
able in the literature (76, 118, 201). Stewart (200) outlined fifteen prob- 
lems to be solved together with recommendations for improving the uses 
of microphotography for scholarly purposes. 
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CHAPTER II 


Analytic, Synthetic, and Diagnostic Studies of 
Individuals! 


RUTH M. STRANG 


Mernops OF STUDYING PERSONALITY have far broader applications today 
than ever before. They are essential to effective war effort and to the 
implementation of democratic principles. Prediction is necessary in the 
placement of personnel in the military services and in the selection of 
personnel in government and industry. In the problems of postwar recon- 
struction a knowledge of the unique contribution each person can make 
will be basic to the effective functioning of a democratic society. 

Theories of personality may either be derived from novel variations 
in methodology, or they may dictate the methods of study employed in a 
given research. In the former instance, new light may be thrown on the 
structure and organization of personality; in the latter instance. the theory 
may be extended or clarified. Various theories of personality are repre- 
sented in the publications of the last three years, but the most significant 
advances have been in the direction of study of the inner organization 
and the uniqueness of personality. 


Persistence and Limitations of Paper-and-Pencil Tests 


In his 1940 review of trends in clinical procedures and psychotherapy, 
Watson (80) noted the marked decline in interest in paper-and-pencil 
personality tests. Yet in spite of the skepticism regarding these instru- 
ments as repeatedly expressed by clinical workers, the experimental 
evidence of their lack of validity, and the arguments against this method 
of assessing personality as summarized by Vernon (79), new inventories 
and scales are still being devised, each with some unique and commend- 
able feature. Although the paper-and-pencil tests have some value as 
screening devices for detecting maladjustment in groups, as approaches 
to the interview, and as a basis for clinical study of individual responses 
and patterns of responses, interest in these instruments is being supplanted 
in recent literature by more enlightening and more valid approaches. 


New Uses of Familiar Instruments 


The use of definitely structured material such as standardized tests has 
certain qualitative possibilities. These qualitative aspects are emphasized 
1 The author is indebted to Margaret McKim for assistance in locating and reviewing references in the 


section on “Diagnosis of Difficulty in School Subjects.”’ 


2See Chapters VI and VII of this issue of the Review for further discussions of means of assessing 
various human traits or abilities. 
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in the administration of the Bellevue-Wechsler intelligence test (81). 
For example, the way in which a subject defines the words on_ the 
vocabulary test tells a good deal about the “quality and character of his 
thought processes,” his cultural milieu, and schizophrenic tendencies. 
Observation of a subject at work on the object assembly test reveals 
something about his thinking and working habits, ability to work toward 
an unknown goal, persistence in completing a task, and sometimes artistic 
and mechanical ability. Porteus (64) recognized similar possibilities in 
the maze tests, offering as they do relatively great latitude of response 
to the subject and thus providing “particularly fertile material for some 
aspects of personality diagnosis.” Gerlach (27) made a contribution to 
methodology in her comparison between psychometric pattern as indi- 
cated by the Stanford-Binet and Cornell-Coxe quotients and asocial and 
aggressive personality types as found in case histories. 

Another example of new uses of old tests for personality diagnosis is 
the new scoring keys for the Strong Vocational Interest Blank. Tussing 
(78) reported that the validities of the keys in the areas of home, health, 
and emotional adjustment were low, but that self-confidence and sociability 
could be predicted fairly accurately by this method. Sheviakov’s work (72) 
is another example of the clinical treatment of a psychometric technique. 
Instead of merely taking the items on the interest test at their face value, 
Sheviakov grouped and studied the responses in various ways and was 
able to construct an apparently valid personality picture of each subject. 
The diagnosis of personality through a study of the relationships between 
tests or parts of tests and through attention to the qualitative aspects of 
a person’s responses is a promising development, probably stimulated 
by the success of the clinical treatment of subjects’ responses to un- 
structured situations. 





Personal Documents and Self-Analysis | 


The most important contribution in this area is the historical survey ) 
and evaluation of the critical literature and experimental studies made 


by Allport (3). 


The personal document may be defined as any self-revealing record that intentionally 
or unintentionally yields information regarding the structure, dynamics, and func- 
tioning of the author’s mental life. (3, xii) 


There are various forms of “first-person human documents” ranging from 
personal accounts, with no checks or technical aspects, to critical and 
experimental studies. Included in this category are autobiographies, 
questionnaire responses, verbatim recordings, diaries, letters, and expres- 
sive and projective documents. Allport found personal documents useful 
in research, in teaching, in the construction of questionnaires and typolo- 
gies, and in social psychology. In comparing the advantages of personal ) 
documents with their disadvantages he found most of the latter irrelevant 

or trivial. 
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A high point in the intentionally revealing personal document is repre- 
sented by the self-analysis of “Clare” reported by Horney (39). In this 
case the procedure by which a patient may not only let her thoughts, 
feelings, and impulses emerge, but also use her critical intelligence in 
their interpretation, is minutely described. Such a document reveals the 
“why” of behavior as the patient sees it in her effort to effect a better 
adjustment. The value of such a self-analysis for research on the springs 
of conduct is obvious. 


New Emphases in Observation 


The technique of observation was so comprehensively reviewed by 
Jersild in the December 1939 issue of the Review that little need be 
added here about this important instrument of scientific research. Perhaps 
the chief developments during the past three years have been its applica- 
tion on the college level. whereas formerly the vast majority of investiga- 
tions were in the preschool field; and more use of the method as part 
of a unified approach to the study of the individual as a whole. Jarvie 
and Ellingson (41) described in detail, with many examples. the recording 
of behavior and the interpreting and implementing of the “anecdotal 
behavior journal” in their institution. McCormick (53) concluded from 
his experience with anecdotal records in the secondary schools of Spring- 
field. Missouri, that this form of observation “has proved to be a feasible 
and useful technique” in the public school. An example of the clinical 
diagnostic use of observation was described by Brown (13) who used an 
experimental situation in which to observe the reactions of psychiatric 
patients to thwarting. He found a continuum of reactions which suggested 
to Watson (80) a possible “objective analysis of psychoses which are 
now measured only with adjectives like ‘mild’ and ‘severe.’ ’ 

The most pervasive use of observation during this three-year period 
was reported by Lerner and Murphy (51). In their research observation 
was the basic method not only in natural nursery-school situations, during 
the intelligence testing and the pediatric examination and with the music 
teacher, but also in a variety of experimental situations. The emphasis in 
all these observations was on the “how” and the “why” of behavior. Every 
activity carried on in the nursery school was considered as an opportunity 
“to understand what it must feel like to be this 3- or 4-year-old, and .. . 
to help children to be their most effective selves” (51: 247). 


Single-Aspect Approaches 


The approach to personality through a single aspect might be called 
the “flower-in-the-crannied-wall” technique. It is true that an individual’s 
pattern of personality may be revealed through his handwriting, his gait 
and other expressive movements, his speech (70), his free associations 
(30), and his sense of security (69). 

The biological basis of personality and the chemical substratum must 
also be recognized. “Certainly all individuals cannot respond similarly 
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to exactly the same experiences. This fact in itself demands analysis and 
explanation” (73). Kahn (46) advocated an endocrine examination to 
determine whether a flyer’s symptoms of tenseness and nervousness. 
insomnia, and psychomotor tension are due to maladjusted endocrines or 
to hereditary structural and physiological weakness of the nervous sys- 
tem. Methods and instruments used in studying anatomical, biochemical. 
physiological, and medical aspects of adolescent development are de- 


scribed in detail by Greulich (32). 


The “Interpersonal Relations” Approach 


The definition of personality as a person’s “social stimulus value” 


naturally leads to a methodology involving classmates’ or associates’ 
opinions or ratings. Using a modified form of the “guess who” technique. 
Tryon (77) was able to make especially enlightening sketches of individual | 
children who received extremes of scores derived from classmates’ opinions. 
Jennings (42) advocated the study not only of “the individual’s emotional- 
social expression choice and rejection of others but similarly the expres- 
sion of other persons toward him.” Bonney (12), Zeleny (84), and many 
others reported sociometric investigations in Sociometry during the past 
three years. The study of the individual as a social atom throws light : 
on such problems as the choice process and patterns, on the characteristics 
of persons in isolated or near-isolated positions as contrasted with indi- 
viduals in leader positions, and the consistency of the “internal structure” 
of the social atom (42). 


Developmental Approaches 


The genetic approach to the study of personality need merely be men- 
tioned here because the December 1941 issue of the Review dealt fully 
with the topic Growth and Development. Since changes in structure are 
closely correlated with changes in function, the clinical study of develop- | 
ment makes an important contribution to the study of personality. This 
point of view is exemplified in Developmental Diagnosis by Gesell and 
Amatruda (28) and by Campbell and Weech (19) in their attempt to 
arrive at a mathematical description of certain aspects of a child’s devel- 
opment against the background furnished by his peers. 


The Case Method 


Olson’s review of the case method in the December 1939 issue of the 
REVIEW opened up a wide field of usefulness in “the establishment of 
professional practices and scientific generalizations.”” His emphasis on the 
value of case studies for administrative information, for the evaluation 
of programs, for curriculum and instruction, for illustration and validation 
of statistical results. and for scientific generalizations was an original 
and significant contribution. ) 

The three most significant trends are (a) the attempts to standardize 
and to quantify case studies, (b) the use of case studies in the prediction 
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of personal adjustment, and (c) the application of statistical method to 
the single case. None of the statistical methods—statistical treatment of 
each item or factor as an independent unit, partial and multiple correla- 
tion, factor analysis, matrix algebra 





have been wholly satisfactory. 
Burgess (17) reviewed four possible procedures: (a) intuitive generaliza- 
tion, (b) analysis of the case in all its individuality, (c) prediction by 
way of typology, and (d) analysis of the data according to fourteen 
factors that seemed to be dynamic. After applying the fourth procedure, 
Burgess questioned “whether a scientific method of analyzing factors will 
be superior, or even equal to the intuitions of persons gifted with deep 
understanding of human nature” (17:348). 

The use of case studies in the prediction of personal adjustment is most 
thoroughly reviewed by Horst and collaborators (40), with the conclusion 
that case study data are definitely relevant for prediction. Stouffer (74) 
suggested a fusion of the intuitive and the statistical approaches: the 
intuitive selection of variates or configurations which the investigator 
thinks important in an individual and the comparison of this dynamic 
configuration with that of other individuals whose success or failure is 
known or with time-sequence records of success and failure within the 
individual case. The study of cases, according to Cottrell (20), “is aimed 
at isolating syndromes and typical personality patterns which experience 
has shown to be correlated with certain resulting behavior, problems in 
adjustment, success or failure in some activity” (20:3609). The study of 
cases involves the three steps of synthesis, genesis, and prediction: (a) how 
one views his life situations; (b) how one came to have such a point of 
view: and (c) what one’s attitudes and overt responses are most likely 
to be under specified circumstances. 

The use of prediction as an approach to the study of personality as a 
whole was suggested some years ago by Barbara Burks and facetiously 
called the “He Would” technique. More recently F. H. Allport and 
Frederiksen (1) applied this method with college students as experi- 
menters and subjects. Their predictions of responses which acquaintances 
would make to a verbal dilemma were only slightly better than chance, 
but would probably have been much higher if “the teleonomic pattern had 
been successfully predicted.” 

The application of statistical methods to the single case introduces “a 
new conception of a ‘population’ for statistics—a population of events 
and traits within the boundaries of one person” (3). By dividing material 
on an individual case into incidents which are then classified according 
to object discussed and attitude expressed, Baldwin (4) built the structure 
of an individual’s personality pattern which agreed closely with clinical 
judgments of the original material. 


Projective Methods 


The materials of projective methods present to the subject a stimulus-— 
situation which is unfamiliar or “unstructured”; in responding to this 
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situation he reveals the way in which he organizes experience, and thereby 
the skilful investigator gains insight into the subjects’ “private world of 
meanings, significances, patterns and feelings” (25). This is the essence 
of the various projective techniques. They constitute the chief method 
thus far evolved of studying the individual as a dynamic whole. The most 
widely used of these techniques is the Rorschach test. A great increase in 
interest in the Rorschach test has been evident during the last three 
years. The Rorschach Research Exchange has been active in disseminating 
experience and mutual criticism and the American Journal of Ortho- 
psychiatry has published numerous articles on technical aspects of the 
Rorschach test. Two books, both containing extensive bibliographies, 
have recently been published giving details of administration, scoring. 
and interpretation. Klopfer and McGlashan (48) gave detail comparable 
to the Terman and Merrill manual for measuring intelligence and is an 
indispensable handbook for beginners. Bochner and Halpern (11) likewise 
presented the Rorschach test as a method of personality diagnosis and 
included helpful case records and protocols obtained from different types 
of persons. Continued research is needed on interrelations within a given 
individual response which help to identify brain injury, schizophrenia. 
and functioning intelligence; on the validation of the Rorschach method 
by “blind” interpretations and by comparisons with psychiatric case 
studies; on factors in personality on which the Rorschach method can 
be expected to give evidence; and on sources of errors in the Rorschach 
test (5). A recent development, stimulated by war needs, is the modification 
of the Rorschach method for use as a group test (354). Another importan! 
development is the increasing use of projective techniques as part of a 
comprehensive study of individuals. 














Synthesis of Data from Comprehensive Sources 


The culmination of fused theory and methodology is found in research 
in which significant data are collected by a variety of appropriate methods 
—psychoanalytic and projective as well as psychometric, developmental 
as well as cross-sectional, social case work, medical, observational, and 
the recording of the physical and psychological environment—and these 
data are synthesized into a structural personality pattern with “manifold 
roots and manifold effects.” 

Several major investigations have developed a methodology in the field 
of personality research along these synthetic and comprehensive lines. 
The Macfarlane “guidance study” (56) is making a unique contribution 
in the procedure of analysis of developmental material within each group 





i 





of data and in the interrelations among different kinds of data. The » 
Adolescent Growth Study reported by Jones (45) is another outstanding 

many-aspect developmental study of personality with more than usual 
attention to biological factors. 


The adolescent study reported by Brown and other members of the 
Adolescent Study Staff (14) is exceptional in its being conducted under 
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public-school conditions and in its ingenious charts for synthesizing infor- 
mation about individual pupils—their goals and purposes, their social 
relationships, and the interaction among different aspects of their 
development. 

Burks (18) described in detail her method of studying “identical twins 
reared apart under differing types of family relationships’—a method 


‘ 


which “seems to offer promise of shedding some light upon the nature of 
traits themselves: their focal character, their variable modes of expression 
and their developmental transformations.” 

This research is an excellent example of insightful analysis and synthesis 
of comprehensive data by means of which “trait organization” or “focus” 
within the individual may be studied. 

Although the Harvard Psychological Clinic’s Explorations in Per- 
sonality (30) was published a year previous to the limits set for this 
review, the reader should be reminded that this research represents the 
most thorough application of previously developed, and of original, sub- 
jective methods to the study of the personality of college students that 
has yet been made. It is the best single source of theoretical interpretation 
of the dynamic, unified approach to the study of “really significant con- 
gruences in personality.” 

On the preschool level a similar approach has been made by Lerner 
and Murphy (51). This research is exceptional not only in the develop- 
ment of new projective techniques for use with young children, but 
especially in the interpretation of the children’s responses and in the 
focusing of observation on the “why” of their behavior. The experi- 
menters attempted to organize the detailed records into a descriptive 
picture and to formulate hypotheses concerning the child’s temperament 
and “foci of emotional drive in terms of conflict, hostility, longing. guilt, 
pleasure, etc.” The hypotheses were then checked in terms of repeated 
evidence from the experiment itself, from subsequent experiments, by 
information from adults, and by further observation and experiments 
designed to test the hypotheses. 

It is especially significant that this synthetic, comprehensive approach 
is being used in the selection of army officers in England and in Germany. 
The examination, according to an article in the London Times in June 
1942, lasts for two days and includes not only the administration of 
individual tests, but also observation of conduct and attitude in a number 
of practical situations. The life history is considered important and em- 
phasis is placed on qualitative aspects in the study of the total personality. 


Diagnosis of Difficulty in School Subjects 


The trend toward a comprehensive synthetic approach to the study of 
the individual, as already noted, is perceptible in the diagnosis of difficulty 
in school subjects. Olson (61), in his longitudinal growth studies of 
school children, presented reading as an integral part of total growth and 
cited instances of reading development as an aspect of the growth of the 
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child as a whole. The same concept of reading as “a function of the total 
growth of the child” was emphasized in the report of the third annual 
Conference on Reading sponsored by the University of Chicago and edited 
by W. S. Gray (31). Witty (82) likewise has been a vigorous advocate 
of the study of the whole individual as a basis for dealing with reading 
difficulties. His list of diagnostic procedures is therefore extensive: results 
of standardized tests, sensory and physiological functions, the interest 
inventory, diagnostic checklist of pupil’s reading, observations of pupil 
and home, trait rating scale and reading evaluation, physical and medical 
data, and home information report. These should yield a wealth of infor- 
mation for an insightful analysis. The Examiner’s Diagnostic Reading 
Record for High School and College (76) serves as both a guide to reading 
diagnosis and a convenient form for summarizing the breadth of infor- 
mation considered of diagnostic value. 

The concept of reading patterns has not yet been as generally recognized 
as the concept of personality patterns. A recent research (75) represents 
a transition from the statistical analysis of masses of data to the insightful 
synthesis of case material in the field of reading. The case studies obtained 
in this investigation focused attention specifically on reading interests 
and responses and the reading case workers employed a procedure “some- 
where between the standardized reading-test procedure and the flexible 
social or psychiatric case study method” (75:5). This combination of 
the interview and tests provided for observation by the examiner and 
some introspection by the subject as well as quantitative measures of 
reading speed, comprehension, and interests. 

Much more emphasis has been placed on the use of tests than on the 
whole-child or reading pattern approach for diagnostic purposes. The 
readiness or prognosis test to detect weaknesses and to indicate teaching 
emphases necessary to prevent failure has been developed in both the fields 
of reading and arithmetic. According to Breuckner (16): 





. the most important function of readiness tests in both reading and arithmetic 
is not prediction of success in the primary grades or at any other grade, but the 
diagnosis of factors likely to interfere with learning at any level of the school, at any 
stage of development, or in the study of any particular process or topic in the cur- 
riculum. 


. 


Other tests of broad and important reading abilities have practical | 
diagnostic value. The test of critical thinking in the social studies de- 
signed for Grades IV, V, and VI by Wrightstone (83), the test of critical ) 
reading with reference to problem solving in the intermediate grades 
developed by Gans (26), the test of reading social studies materials in the 
high school by Martin (57), and of the reading of elementary algebra by . 
McKim (55), both of the last two tests being based on typical reading 
demands made by their respective subjects—these are examples of tests | 
of decided value in the diagnosing of reading difficulties. 

Analysis of errors has long been a common method of diagnosing = / 
difficulty in arithmetic and reading. Bennett (6) analyzed 34,274 errors 
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made by retarded readers in the recognition and pronunciation of 237 
basic words in a contest. She pointed out that errors do not occur in a 
haphazard way. Further advances in individual diagnosis through the 
analysis of errors may be made by studying patterns of errors and relating 
these to other factors such as conditions under which certain combinations 
of errors occur. 

Other approaches to the diagnosis of reading difficulties are through 
the study of separate factors: vision (62), visual fatigue (23), eye- 
movements (29), level of aspiration (38), “organized comparisons be- 
tween meanings” (66), mental ability (59), and reading interests and 
experiences (63). 

Details of diagnostic procedures are supplied by the report by Hildreth 
and Wright (37) of the remedial reading class of eighteen pupils, by 
Preston’s (65) case histories of forty pupils referred for remedial work 
after one to nine years of failure in reading, and by Durrell in his book 
Improvement of Basic Reading Abilities (24). 

One of the most valuable diagnostic procedures in spelling is to ascer- 
tain a child’s grade level on accepted spelling lists. Betts (7) provided 
such a list of 8,645 words for Grades II to VIII, giving the median grade 
placement, frequency in seventeen spellers, and grade range for each. 
Another spelling scale was published in the same year by Bixler (8). 
This contains a list of 3,679 words with tables indicating the percentage 
of pupils whe can spell each word at each grade level from II to VIII. 

It has been disappointing that more attention has not been given to 
the diagnosis of process such as Brownell (15) developed in arithmetic 
and Joseph Dewey in reading. Such diagnosis requires direct observation 
on the part of the investigator and introspection or verbalization on the 
part of the child. In this way the mental processes which lead to correct 
or to incorrect solutions can be ascertained. Two children may have the 
same score on a reading test but their reading ability may be quite dif- 
ferent because of differences in their methods of work. In the individual 
testing situation these different patterns of methods of work may be 
diagnosed, for the examiner can ask the subject to explain how he solved 
the problems or did the tasks. 


Significant Developments in Methodology 
of Studying Individuals 


Among the most significant developments in methodology are: 


1. The inclusion of additional sources of data in the study of an indi- 
vidual—the environmental setting, the situations in which responses are 
made, more use of unstructured and experimental material, observations 
directed toward the “why” of behavior and social interaction, and the in- 
creased use of personal documents. 

2. Increased use of the genetic approach beginning with the young child 
and working forward rather than beginning with a maladjustment and 
working backward. 
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3. “Insightful” analysis and synthesis into a unitary structure of the 
comprehensive data collected. 

4. The search for persistent, pervasive trends in personality that mani- 
fest themselves in specific behavior and are allied to conation, purpose, 
striving. 

5. The cautious approach to any quantitative analysis of personality, 
with the recognition that the attempt at standardization may “disrupt 
dynamic patterns.” 

6. The attempt to predict from case study material. 

7. The formulation of hypotheses concerning human personality from 
the study of individual cases, recognizing that although the single case 
does not discover a law, it does discover that there is a law, and that 
therefore the individual case has research importance. 

8. The clearer recognition that appraisals of personality reflect the 
personality and training of the investigator and that the psychologist 
must therefore “himself become an instrument of precision.” 

9. There are individual differences in subjects with respect to their 
response to different methods, which may make one approach more 
appropriate for one person than for another person. 

These trends represent progress in the study of the uniqueness of per- 
sonality and in the diagnosis of maladjustment. They might well be 
applied to a greater extent in the diagnosis of difficulties in school sub- 
jects. Each trend implies a criticism as well as a commendation of the 
present status of diagnostic studies of individuals. 
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CHAPTER III 


Survey and Trend Studies 


DAVID SEGEL 


Tus CHAPTER is concerned with studies involving mass data which 
represent conditions or trends. The treatment is divided into (a) school 
surveys—state, city, and community; (b) large-scale testing programs; 
and (c) studies of language. 

General problems of methodology in making status and trend studies 
are discussed by Segel (30) and by the Committee on Educational Re- 
search at the University of Minnesota (21). 

Sears (28) pointed out that the survey is not itself a definite research 
technique but is rather the over-all interpretation and synthesis of facts 
discovered through more detailed research techniques. In considering 
the methodology of the school survey, two aspects should be emphasized: 
one is the application of new techniques found to be useful in gathering 
data in surveys, and the other is the general approach to the problem 
and the synthesis of the results. A survey properly conceived and executed 
is the most comprehensive and at the same time the most valuable of all 
research undertakings, because it deals with the broad aspects of actual 
field (or life) situations. 


State School Surveys 


The Regents’ Inquiry into the Character and Cost of Public Education 
in the State of New York (1, 6, 10, 11, 17, 20, 25, 26, 32, 33, 39, 40, 41) 
set up first of all a general objective: to discover any failure of the New 
York State school system to meet the needs of youth. This general objective 
was broken down into workable units of inquiry, such as the needs of 
youth in occupational, civic, or recreational areas. 

The survey did not stop with examining directly the school program, 
as is customary in school surveys, but also examined the effect of the 
school on former students and on the out-of-school activities of students. 
Two thousand former students of the high schools of the state of New 
York were interviewed regarding their major problems, attitude toward 
their present jobs, source of any advice received in regard to vocations, 
reading activities, club activities, movie attendance, training being re- 
ceived on the job, and hobbies. The information obtained from the 
employees was concerned with the initiative shown and other evidence 
of satisfactory work. The data were considered in relation to the curricu- 
lums that the young people had followed while in school. Through this 
type of analysis there was a definite attempt to correlate the conditions 
of school environment with adjustment in later life. 
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This survey exemplifies the following desirable steps: 


1. Setting up objectives of the survey. 

2. Determining the types of investigation which will secure data bearing on the 
objectives of the survey. 

3. Gathering primary data through whatever instruments are most pertinent. This 
step involves the investigation of the school and its program, and also of the product 
of the school. 


4. Synthesis: studying the interrelationships of various data gathered. 
5. Formulation of conclusions regarding needed changes in the school system. 


Another type of state survey is the study by Mort and Cornell (23). 
This study dealt with the adaptation of school practices to changing 
needs in nine phases of work. Critical innovations were the kindergarten, 
reorganized high schools, special classes, homemaking for boys, adult 
leisure classes, extracurriculum activities, elimination of final examina- 
tions, integrated curriculums, and supplementary reading. The degree 
to which these practices existed in the schools of Pennsylvania at different 
points of time were studied through questionnaires sent to all first-, 
second-, and third-class school districts, and a one-tenth random sample 
of the fourth-class districts of the state. The growth throughout the state 
in each adaptation was traced. 

An interesting part of this report is a supplement describing the research 
methods used in the study. “The authors have attempted to avoid limita- 
tions of particular techniques, methods, and points of view, and have 
moved forward in their study with an effort to make use of such tools 
as best served the field. In our own minds, we feel that we have avoided 
the pitfall of being drawn into polarization of point of view with reference 
to the whole and the part method, the mass descriptive approach versus 
the limited case study upon carefully isolated factors, or the statistical 
versus the non-quantification approaches. ... We realized that along 
with methods of well-established utility we have employed some of doubtful 
or uncertain reliability and validity. All this was done with the purpose 
in mind of utilizing all tools at hand toward the end of examining a very 
intricate and dynamic complex, the process of adaptation” (p. 435-36). 


City School Surveys 


Both the St. Louis (35) and the Pittsburgh (36) surveys established 
new trends in city surveys through the use of several new techniques in 
getting at civic and social growth in school children. In the St. Louis 
survey, for example, several of the newer tests of social and civic com- 
petence were given. An important comparison was that made between 
the scores on these tests and the number of semesters of work in the 
various social studies that the pupils had taken. Among the instruments 
used for this purpose were Judgments Characteristic of the Socially Com- 
petent Person; Test of Critical Thinking in the Social Studies; What Do 
You Think? Ordon Social Science Test; and the Melbo Social Science 
Survey Test. 


493 














REVIEW OF EDUCATIONAL RESEARCH Vol. XII, No. 5 








An important change in the use of measurement in city surveys is 
illustrated by the Pittsburgh survey. Tests of achievement in subject fields 
such as reading, arithmetic, and English, were not given in the survey, 
because the Research Division of the city schools had data from such 
tests available in its files which were used by the survey staff. This survey 
took an important step by investigating the extent to which test results 
were used by the schools to individualize instruction and provide indi- 
vidual guidance. A school survey should not only determine general 
levels of competence but also see if the schools are using all the facilities 
possible to determine individual differences and adapt their instruction 
and guidance accordingly. An excellent analysis of the research procedures 
of the St. Louis survey has been made by Caswell (3). 

Surveys are using new techniques of evaluation but still suffer by the 
general practice of uncontrolled observation of classroom work. Com- 
prehensive school surveys should take advantage of the new instruments 
of general evaluation being evolved for secondary and elementary educa- 
tion, such as those developed by the Cooperative Study of Secondary 
School Standards and those developed in the Research Division of the 
state department of education of New York. 

The Columbus, Ohio, survey of health and physical education activities 
(12) is an excellent example of a survey of an important area of a school’s 
activities. A new method of getting at the mental health of students was 
tried, thru employing the following criteria: 

1. The child is considered a chronological misfit if his age differs from the median 
age of his classroom group by more than one year. 

2. A child is regarded as an intellectual misfit if his mental age is more than one 
year below or more than two years above the median for his own classroom. 

3. A child is regarded as an academic misfit if his reading achievement is more 
than one year below or more than two years above the median of his classroom. 


4. A child has a reading disability if his reading age is more than one year below 
his mental age. 

5. A child is regarded as a school failure if he is repeating his grade (or half-grade) . 

6. A child is regarded as a truant if he has been a truant from school during the 
current term. 


7. A child is regarded as a behavior or personality problem if he rates low on one 
of the better behavior rating devices, or the Personal Index, or the California Per- 
sonality Test. 

In order to be considered as a mental hygiene case the child must have 
failed in two or more of these criteria. 


Community (Sociological) Surveys 


In community, or sociological, surveys as contrasted with educational 
surveys, the problem of adequate sampling looms large. This is true be- 
cause it is more difficult to sample adequately a miscellaneous group of 
people, such as those in a town or rural area, than it is to study pupils 
already classified by ages and grades in school. 

Jenkins (14) discussed this problem in connection with his study of 
the growth and decline of agricultural villages. His definition of an 
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agricultural village is a town between 250 and 2,500 in population, situated 
in a farm area, and largely dependent on the farming population for its 
continued existence. Since there are approximately 7,000 villages of all 
types in this population range in the United States it would be an almost 
impossible task to study all the agricultural villages. Jenkins chose for his 
study the villages used originally by the Institute for Social and Religious 
Research in 1924. These villages were selected by first making a rough 
count of the number of agricultural villages, thus giving the proportion 
of villages to be selected from each state or region. Within each area 
villages to be included in the study were selected by sociologists and 
others familiar with the area. Since the sample used was set up in 1924 
a question was raised as to its representativeness in 1938 or 1939. 

Terry and Sims made a cross-section study (37) of all aspects of a 
rural community. The study is largely a detailed and intimate description 
of the life of the community. Whereas in most social studies people are 
interviewed and documentary evidence is examined, in this survey the 
surveyors actually participated in the life of the community. The com- 
munity was made to feel that the visitors were interested in what the 
community was interested. The writer feels that this method might properly 
be called the “life sample” method since the investigators lived short 
samples of time as persons in the community. As an illustration the 
investigation of the religious life of the community may be mentioned. 
The surveyor attended meetings as a visitor-participant. The church 
activities were in no case cramped by the pressure of the visitor. The 
surveyor sang the songs and took part in the services in the same way 
as any cousin or uncle visiting in the community would do. 

Frederick and Geyer made a community survey at Battle Creek (8). 
The American Council on Education published a guide to community 
surveys (4). 


Regional Testing Programs 


Testing programs over wide geographical areas are carried on in a 
variety of ways and purposes. Some programs use general ability tests 
with high-school freshmen: more often the program consists of achieve- 
ment testing for such purposes as motivation of better scholarship or 
teaching, evaluation of the curriculum, or to provide diagnostic measures 
for use in remedial instruction. A new emphasis on the guidance aspect 
of such testing programs is found in the new Iowa undertaking (19). 
This is based on the two following general objectives: 

1. To enable teachers, administrators, and counselors to keep themselves more in- 
timately and reliably acquainted with the continuing educational development of each 
individual pupil, in order that instruction and guidance may be better adapted to his 
peculiar and changing interests, needs, and abilities; and 

2. To provide the school administrator with a more dependable and objective basis 
for the over-all evaluation of the total educational offering of the school, in order that 
any need for curriculum revision may more surely be brought to his attention, and 
that his supervisory efforts may be more wisely distributed. 
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The testing program set up to satisfy these two objectives would, accord- 


ing to the reasoning of the Iowa plan, be one which has the following 
characteristics: 


1. The tests used should measure as directly as possible the attainment of the 
ultimate objectives of the entire school program. 

2. All the tests should be administered, under standard conditions, to the entire 
student body. 


3. The program must provide for the measurement of growth. This means that the 


tests should provide for periodic measurements with the same or with comparable 
tests, 

4. The tests used should measure the more permanent changes produced in the 
pupils. For this reason it is planned to give the Iowa tests at the beginning of the 
year since no cramming can take place for such examinations and since it is after a 
summer vacation that permanent results of instruction show up best. 

5. The test results should not be usable in the rating of individual teachers. Since 
the tests are given at the beginning of the year the results cannot be used to check 
on the efficiency of instruction of particular teachers or classes. 

6. The test results should be available in a readily interpretable form. This means 
that scores from the various tests can easily be compared through comparable scores 
or through graphical profiles. 

7. The measures derived must be highly comparable from test to test. 

8. Each of the tests used must yield highly reliable measures of the abilities of 
the individual pupil. 


The instruments devised for the testing in Iowa cover the following 
fundamental abilities of pupils in the secondary school: (a) the under- 
standing of basic social concepts, (b) the ability to do quantitative 
thinking, (c) the ability to write correctly, (d) proficiency in the natural 
sciences, (e) the ability to interpret reading materials in the social studies, 
(f) the ability to interpret reading materials in the natural sciences, 
(g) the ability to read literary materials, (h) the ability to use important 
sources of information, and (i) the ability to recognize important word 
meanings. 

The Illinois 1941 State Testing Program (7) was a cooperative pro- 
gram sponsored by the high schools and colleges of the state of Illinois 
to aid high schools in the guidance of seniors and colleges in their admis- 
sion program. The American Psychological Examination and a special 
reading test constructed by the Board of Examiners of the University of 
Chicago were used. 

Both the Iowa and Illinois State Testing Programs are voluntary. 
An important aspect of both programs is the provision made for mechanical 
scoring at a central point. This is important because a tabulation of the 
scores is necessary before norms can be made available. 

The purpose of state department testing programs has been mainly 
accrediting of schools or promotion of pupils to higher institutions. There 
has been a tendency to get away from such purposes because the type 
of testing encouraged—testing specific subjectmatter as laid down in state 
courses of study—was not to be commended. An example of a new em- 
phasis is that shown by the Examination Division of the state of New 
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York. In that state there is being gradually introduced a new system of 
tests based on fundamental abilities, such as reading and mathematics, 
the scores of which the schools are encouraged to use in instruction and 
guidance. This development in New York is something like that in Iowa 
already described. 


Age-Grade-Progress Studies 


The age-grade survey of Los Angeles County (19) made comparisons 
of age-grade status for the years 1929, 1933, and 1937. This study com- 
pared, in terms of percents, normally placed, over-age, and under-age 
students for the three years in the nonurban schools of Los Angeles County. 
The New York City report for the school year 1940-41 (24) included 
comparative figures of promotion and ages for most of the years 1925 
to 1941. Age-grade-progress for 1940-41 was analyzed at length. The state 
department of education of New York (22) made a study covering the 
progress of pupils in rural districts from Grade VIII through high school, 
a period of five years. Some of the broader aspects of the maladjustment 
between the pupils and the school program were ascertained. This study 
is a trend study based upon two cross sections made at an interval of 
five years. 

There is a general weakness in statistics gathered about children in that 
there is little uniformity in the standard set for normal school entrance 
or for different grades or periods of school life. This mitigates against 
comparisons of different geographical or political regions. The U. S. 
Office of Education, through its bulletin on age-grade-progress (31). is 
encouraging the standardization of age-grade-progress data. 


Studies of Language 


The study of language has increased considerably during the last few 
years. In general there appear to be three types of approaches in this field. 
The first consists of the consideration of the logical basis of language as 
a carrier of meaning. such as in Carnap’s new book (2) and Korzybski’s 
book (16). These authors studied meaning through building up another 
language to explain the original language. “By a semantical system (for 
interpreted system). we understand a system of rules formulated in a 
metalanguage and referring to an object language, of such a kind that the 
rules determine a truth-condition for every sentence of the object language, 
i.e., a sufficient and necessary condition for its truth.” Carnap has expanded 
considerably the “metalanguage” suggested by Korzybski and has set up 
definite rules for using this language in analyzing declarative English 
statements. The metalanguage is a combination of symbols and English. 

Another approach to the study of language is that illustrated by Fries 
(9) who studied intensively the language found in 3,000 letters of a 
federal government department. He first classified the writers into three 
social or class groups in accord with definite information including the 
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education and occupation of the writers and in some cases a special con- 
fidential report on the family. The language facts for the three groups 
of persons were examined for forms of words, the uses of function words, 
or the uses of word order. Upon the basis of his study, Fries recom- 
mended a new type of approach in the teaching of grammar. 

The third type of language study is the statistical study of the occur- 
rence of word meaning or of errors of language. Davis (5) noted that 
in counting errors in oral speech, the amount of such error can only be 
judged if the relative amount of error can be compared with the amount 
of correct speech. Davis made this comparison by taking actual transcrip- 
tions of the speech of a large number of children. Those interested in word 
counts should read Thorndike’s analysis of word counts (38), which 
discusses the results of various types of approach. Knott (15) reviewed 
somewhat the same problem, as did also Seegers (29). Rinsland (27) 
studied the words from 100,212 separate writings of children in Grades 
I through VIII. The words were analyzed for the different meanings used. 
There are two weaknesses in the method: (a) Children hesitate to write 
words they cannot spell and may therefore leave out words they use 
orally. (b) The words children use may not necessarily be the words they 
should use or the words that they should know when reading. Stone (34) 
studied vocabulary based on children’s words and Horn (13) reviewed re- 
search on adult vocabularies. 
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CHAPTER IV 


Experimental and Statistical Studies: 
Applications of Newer Statistical Techniques' 


PAUL BLOMMERS and E. F. LINDQUIST 


The LAST FEW YEARS have witnessed a gradual and rather belated intro- 
duction into research procedures in education of the many new and in 
some cases very important statistical techniques that have become available 
during the past two or three decades. While many of the educational ap- 
plications of these newer techniques have been valid, not a few have evi- 
denced a faulty understanding of the techniques or a failure to recognize 
the implications of the theoretical assumptions underlying their deriva- 
tions. It is the purpose of this chapter to attempt a general appraisal of 
these applications from the technician’s point of view. The chapter has 
therefore been organized with reference to techniques. Only as many studies 
have been cited with reference to each technique as were needed to yield 
illustrations of important prevalent errors and misconceptions, or, occa- 
sionally, of valid applications. In some of the studies cited the error may be 
of relatively minor consequence and in all other respects the study may 
be very competently conducted. 

The treatment is divided into three general, somewhat overlapping areas: 
problems involving correlation, significance of differences between means, 
and analysis of variance and covariance. 


Significance or Reliability of an Obtained Correlation Coefficient 


Newer techniques for testing the significance or reliability of an obtained 
r have been largely ignored by research workers in education. Four in- 
stances were noted in which this application of the Fisher z-transformation 
was made (6, 12, 33, 55) to test the null hypothesis. While the z-function 
provides a satisfactory test of this hypothesis, the F-test 


r2 
l-r 


F== 





(N—2) 


provides a somewhat more exact and conservative measure of the signifi- 
cance of an obtained r. No such application of the F-test was observed. 
In a fifth instance (18) the z-transformation was employed to establish 


the 5 percent-level confidence interval (fiducial limits) for obtained cor- 
relation coefficients. . 


1 For other reviews of statistical methods see Chapters VII and VIII of this issue. 
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Averaging Correlation Coefficients 


The Fisher z-transformation makes it possible to obtain an improved 
estimate of the population correlation by pooling estimates of the correla- 
tion based on several independent random samples drawn from that popu- 
lation. Four applications of this procedure were encountered (21, 25, 
33, 40). For this procedure to be valid the estimates averaged must be 
based on random samples from the same population (or equally corre- 
lated populations). In some instances the technique concerned has been 
employed without regard to this condition. Rizzo (40) obtained intercor- 
relations for scores yielded by three different scoring procedures applied 
to three different tests for each of eight different grade levels. As nearly 
as could be told from the data presented, at least half of the differences 
in the intercorrelations thus obtained were significant. Yet Rizzo not only 
used the z technique to pool correlations for different techniques within 
a given grade-level and for different grade-levels within a given testing 
technique, but also for all grades and al! testing techniques. A similar 
error was made in the second part of the study. It would appear that Rizzo 
has incorrectly assumed that the z technique permits the averaging of 
correlation coefficients in general, without regard to possible real differences 
between them. 

McConnell (25) employed the z-transformation to average the inter- 
correlations between scores on the major subjectmatter subdivisions (phys- 
ics, acoustics and astronomy, and chemistry) of a comprehensive examina- 
tion in physical science. He also averaged the intercorrelations between the 
scores on the various “outcome” parts (vocabulary, knowledge of facts and 
principles, application of facts and principles) of this same examination. 
This procedure would be valid if one could assume that the correlations 
averaged were obtained from independent random samples rather than 
from the same sample, and if one could assume also that these correlations 
were all estimates of the same value. However, neither of these assumptions 
seems reasonable and the procedure is open to question. 

Lannholm (21) reported strong evidence of real differences from school 
to school in reliability coefficients and also in validity coefficients for the 
same test. However, in order that he might obtain a general estimate of 
reliability and validity for the tests he was studying he made use of the 


z-transformation to average the coefficients for the various schools involved, 
commenting: 


Since this method of averaging correlation values is not valid if true differences 
in correlation exist from one school to the other, the use of this procedure may be 
questionable in this case. Nevertheless, it is believed that the averages thus obtained 
represent perhaps the best possible estimate of the general validity of each of the 
different tests (p. 70). 


The Significance of a Difference between Correlation Coefficients 


The Fisher z-transformation was properly applied in a number of studies 
(3, 26, 29, 48, 52, 58) to test the significance of a difference between 
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correlation coefficients obtained from independent random samples. In 
some studies, however (4, 13, 25), this test was incorrectly applied to 
related correlation coefficients obtained from the same sample. Edgerton 
and others (13), for a sample of 288 men, obtained the six correlations 
between a total liberalism score and separate liberalism scores in democ- 
racy, economic relations, labor, race, nationalism, and militarism. The 
same six correlations were obtained for 149 women. Sex differences in 
these relationships were then tested by the Fisher z-test. In this application 
the samples are apparently independent and, if random, the test is valid. 

These writers, however, go further. They present 129 correlations for 
men and state that hence there are 8,256 possible differences in correlation 
cofficients for men alone. Since it was not feasible to test each of these 
differences individually, the writers provided a table containing the signifi- 
cance ratios (exceeding 2) for differences between r’s based on samples of 
288 cases. They then suggest that by means of this table “. . . the reader 
can estimate the significance of the difference between any correlations 
which he wishes to compare in this study” (p. 262). Inasmuch as the vari- 
ables involved are related and all correlation coefficients are derived from 
the same sample, none of the tests of significance that a reader might choose 
to make by means of this table would be valid.* Even though all these 
correlation coefficients had been obtained from independent random sam- 
ples the mass procedure suggested in this study would still not be valid. 
The fallacy involved in effecting such tests en masse may be appreciated 
by noting that if the 129 correlations were derived from independent 
random samples drawn from the same population (i. e., all estimates 
of the same value), some 400 of the 8,256 possible differences between 
them would be judged “significant” according to this procedure. 


Significance of a Difference Involving Spearman-Brown Estimates 


Several of the studies cited in the preceding section have been concerned 
with the validity of the Spearman-Brown formula for predicting reliability 
coefficients under circumstances somewhat different from those for which 
this formula was originally intended. Bruce (4), for example, obtained the 
grade point averages of 209 students who remained in college attendance 
for twelve consecutive quarters. She estimated the reliability of the grade 
point averages by correlating those earned by each student in two consecu- 
tive quarters, From this r the reliability of grade point averages over n quar- 

2A test of the significance of the difference between related correlation coefficients of the type ry1 


and rye, both of which have been obtained from the same sample, has been derived by W. G. Cochran. 
This test, which has never been published, is 


= (r,, —T)VN —3V/1L4 re , df=N-3. 


/9 2 2 2 
V2V 1-98 — TA 1 + tie hy Tye 











An equivalent test was independently derived by Hotelling. See H. Hotelling, ‘“‘The Selection of Variates 
for Use in Prediction with Some Comments on the General Problem of Nuisance Parameters,’’ Annals of 
Mathematical Statistics, 1940, Vol. 11, pp. 271-83. 
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ters was predicted by means of the Spearman-Brown formula. Since the data 
were available for the direct computation of the reliability over n quarters, 
the obtained r for n quarters was compared with the predicted r for n 
quarters by means of the z-test. This application of the z-test is not strictly 
valid because the predicted r and the obtained r are not here independent. 
Bruce was aware of this shortcoming and tried other methods of testing 
the difference concerned. 

Remmers and others (10, 36, 37, 38, 39) have contributed a series of 
studies concerned with the hypothesis that a multiple-choice test having 
items with 2r responses is twice as long as a test consisting of the same 
number of r-response multiple-choice items, and that consequently the re- 
liability of the 2r-response test may be predicted from the r-response test 
by means of the Spearman-Brown formula. Though the studies in this 
series are concerned with various types of measuring instruments and vary 
somewhat in approach, they are similar enough that a few descriptive com- 
ments with respect to one of them (10) will suffice for the lot. Denny and 
Remmers (10) divided some 1,000 high-school pupils into four groups. 
Each group was given the same 100-item multiple-choice vocabulary test, 
with the exception that the number of possible responses varied from group 
to group. That is, the test administered to one group consisted of 5-response 
items, that administered to a second group consisted of 4-response items, 
and so forth. For each form of the test the reliability coefficient was com- 
puted by the “odds-evens” method involving the use of the Spearman-Brown 
formula. From each reliability coefficient thus obtained the reliability for 
each of the other three forms was predicted by employing the Spearman- 
Brown formula in the manner suggested by the hypothesis being tested. 
Differences between the predicted and theoretical reliability coefficients 
were then tested by the z-test. Since the obtained reliability coefficients and 
the corresponding theoretical coefficients are based on independent samples, 
the test is valid so far as this consideration is concerned. Strictly, however, 
the formula for the standard error of z is designed for correlations com- 
puted directly from random samples, not for those estimated from obtained 
correlations by use of the Spearman-Brown or any other formula. It is 
interesting that in no case was the difference between the z’s as large as 
the standard error of the difference—a result one would hardly expect even 
though the hypothesis were known to be true. 

On the basis of these results the authors wrote, “For vocabulary test 
items varying in number of responses from two to five it is concluded that 
the experimental data completely support the hypothesis” (p. 704). If the 
authors imply that they have established the hypothesis, they have violated 
a fundamental principle of statistical logic. No null hypothesis can be es- 
tablished on the basis of results from random samples. The sample data 
may be consistent with the null hypothesis and may establish its tenability 
as an hypothesis, but they do not completely support it. One may not con- 
clude that there is no real difference simply because the obtained difference 
is not significant. 
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Intraclass Correlation 


Two applications of intraclass correlation techniques were encountered 
(28, 34), both of which were concerned with the study of twins. Portenier 
(34) compared twelve pairs of twins with twelve pairs of siblings on a 
number of personality measures. For each of these measures the intraclass 
correlations were obtained for the twins and for the siblings. These r’s 
were transformed into z’s and the differences between these z’s for the 
twins and the corresponding z’s for the siblings were tested for significance. 
To find the standard error of these z’s the author used the formula 


1 


ee This is actually the formula for the standard error of a 
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arising from an interclass r, the proper formula for the S.E. of a 
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arising from an intraclass r being ¢6,= —===. It should be noted that 


n—3 
in using the improper formula Portenier erred on the conservative side. Re- 
garding the reliability of the differences obtained Portenier stated that 
none of the significance ratios exceeded 2.0, but that if any ratio greater 
than one is acceptable, certain of the differences are significant. This last 
standard is approximately equivalent to accepting the 32 percent-level of 
confidence as a criterion of significance, which is contrary to all practice. 
The most important point to be noted is that because of the small number 
of pairs involved (12 pairs) any demonstration of real differences between 
z’s (when 2.0 is taken as the critical value of the significance ratio) requires 
that the obtained difference be as great as .72. This condition illustrates 
the futility of attempting to demonstrate the presence of small real differ- 
ences between correlation coefficients derived from small samples (21a). 
Morgan (28) obtained certain measures of eye-movement performance 
in reading for a sample of artificial twin pairs, a sample of fraternal twin 
pairs, and a sample of identical twin pairs. The intraclass coefficients of 
correlation were reported for each of the three types of pairs on each of 
the measurements taken. The manner of computing the P.E. of the obtained 
coefficients was not reported and, although the necessary data were given, 
the writers were unable to duplicate certain of Morgan’s reported P.-E. 
values. The P.E. of the distribution of obtained intraclass r’s about the 


2 
. ’ . ‘ oat p 
true coefficient, p, is ordinarily given by P.E.,=.6745 VN when the 


number in each class is 2 and when N is sufficiently large. This formula 
is limited not only by the fact that it involves a population parameter but 
also by the fact that the distribution of obtained r’s about p becomes in- 
creasingly skewed as r approaches +1.00. Since some of the r’s found 
were of considerable size, this study illustrates an application of the P.E. 


which is often virtually meaningless but which unfortunately is common 
in educational research. 
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Correlation Coefficients from Relatively Homogeneous Subsets 

When a bivariate population is comprised of subpopulations such that 

| the subpopulations are more homogeneous with reference to one or both 
| variables than the total population, and such that the correlation between 
| the variates is the same for all subpopulations, then, on the basis of random 
samples (groups) drawn from the various subpopulations, a best estimate 
( 


of the common “within group” correlation may be obtained by the methods 
of analysis of covariance. Osborn (30) employed this procedure properly 
to estimate a number of correlation coefficients, such as those between 
performance on an achievement test and the shift in attitude toward 
certain social issue resulting from a propagandizing treatment. His subjects 
consisted of the members of intact class groups selected from various school 
systems. On the grounds that attitude toward the issue concerned tends 
to be less varied within a school community than in the total population 
of school children, a given school may be regarded as a sample from a 
relatively homogeneous subpopulation. Hence, Osborn used the formulas 
of analysis of covariance to obtain the “within class” correlations, thus 
eliminating from the coefficients whatever effect between-school differences 
may have had upon them. 


Kuder-Richardson Reliability Formulas 


The use of the Kuder-Richardson formulas to estimate the reliability 
of a test is rapidly increasing. Andrus, Cronbach, and Hastings (2. 8, 18) 
each used the Kuder-Richardson Formula (20). Froehlich (16) pre- 
sented a slightly different form of the Kuder-Richardson Formula (2/). 
This particular formula is based in part on the assumption that all the 
items of the test are of uniform difficulty. Froehlich made a rough em- 
pirical check on the cruciality of this assumption by administering the 
five parts of the Wisconsin Achievement Test to 2.000 individuals. He 
obtained the reliability coefficients for each of the five-part scores and for 
the total score by both the split-halves procedure and the Kuder-Richardson 
Formula (21). The difficulty of the items on this test ranged 71 points 
on a 100-point scale with a standard deviation of 16. Yet the differences 
between the indexes of reliability were relatively slight, the largest dif- 
ference being .058. Moreover, the two indexes correlated perfectly. 

The Kuder-Richardson formulas are based on assumptions which are 
far from satisfied in any applications which have come to the reviewers’ 
attention, and hence it is difficult to say what it is that the r’s thus obtained 
actually measure. Perhaps what they measure is better described as “in- 
ternal consistency” than as “reliability,” as the latter term has usually been 
employed. The Kuder-Richardson, the “odds-evens,” and the “equivalent 
forms” techniques do not describe exactly the same characteristic of a 
test. There is a real need for further clarification of the issues involved 
in the choice between these techniques. 
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Testing the Significance of the Difference between 
Means of Independent Random Samples 


Applications of the ¢ distribution (students’ distribution) to test the 
significance of the difference between means of independent random 
samples are becoming increasingly common (1, 6, 14, 15, 27, 32, 35, 
54, 57, 59). Pintner (32) made extensive use of the t-test in studying the 
differences between normal hearing and hard-of-hearing individuals of 
various age groups with respect to certain measures of personality traits. 
The means which he thus compared were based on independent samples 
of varying size. On the whole his samples were large, so large in fact 
that in most instances it was necessary to enter the ¢-table for an in- 
finitely large number of degrees of freedom. As a consequence the tradi- 
tional procedure * would in theory have been somewhat superior to the 
t-test * because the latter assumes homogeneity of variance whereas the 
former does not. 

According to Fisher (14a) the value of t yielded by the latter test tends 
sometimes to be increased by a difference in variance between the popula- 
tions from which the sample is drawn.” It would be well to distinguish 
more clearly between these two tests and between the assumptions un- 
derlying their use. The traditional test is generally to be preferred for 
large samples: for small samples the t-test must be used. A particularly 
serious error is that of employing the traditional procedure with small 
samples and interpreting the significance ratio thus obtained as though 
it were the ¢ statistic. Wolfe (57), for example, used the traditional pro- 
cedure in comparing a small group of average readers with a small 
group of retarded readers with reference to such factors as laterality, 
audition, vision, verbal association, and adjustment. She interpreted the 
critical ratios thus obtained as @’s, entering the t-table for seventeen degrees 
of freedom, Wolfe dealt with equal sized groups, in which special case 
the procedures yield identical significance ratios. However, the proper 
number of degrees of freedom is thirty-four rather than seventeen, since 
the groups compared each involved eighteen subjects. It is less likely 
that this error would have occurred had the proper formula for t been 
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5 Suppose, for example, that N1 = 500, Ne— 1000, S?=—32, and S:=4. Here F—=8.01 whereas 
F.o1 = 1.19. With this large difference between both the sample variances and frequencies, the value of 
t based on these samples and for any difference in means will be 1.373 times as great as that of the 
significance ratio yielded by the traditional procedure. On the other hand if esto? = esto; the two 


procedures yield identical results regardless of the difference between N, and N,. 
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employed. Another instance in which the traditional procedure was em- 
ployed and interpreted as ¢ was found in a study by Vaughn (54). Here 
again, however, the groups upon which the comparison was based were 
equal in size and hence the results are valid. 

The application of an estimate of error designed for use with inde- 
pendent samples leads to biased results when applied to samples which 
are not independent. Researchers working with paired samples have gen- 
erally taken this fact into account. However, it has frequently been ignored 
by workers dealing with groups which have been equated with respect 
to some control variable by simply making the means and standard devia- 
tions approximately equal for the two groups. Young (59), studying the 
relative effectiveness of different lengths of practice periods, employed 
three procedures with reference to four phases of school learning. The 
twenty-three subjects which comprised each of the three experimental 
groups were closely matched with respect to age, intelligence, and initial 
ability in the particular phase of learning being considered. Young stated 
that the significance of the difference between group means was tested 
by the ¢-test. Recomputation of the reported probabilities of the ¢’s obtained 
for various differences between means showed that the t-test employed was 
that designed for use with independent samples. This particular test is not 
appropriate here. 


Testing the Significance of the Difference between 
Means of Paired Measures 


Applications of the t-test of the significance of the difference between 
means of paired measures are not common in the field of education (6, 9, 
24, 25, 30). A particularly interesting application of this t-test is found 
in a study by Osborn (30). Osborn sought to determine whether the 
change in attitude toward a certain social issue was significantly less 
for individuals who had been taught to be on guard against certain tech- 
niques of propaganda. Osborn’s subjects were the members of intact 
classes of school pupils selected from a number of schools. Each school 
yielded two classes, one of which was used as an experimental or “taught” 
group and the other of which was used as the control or “untaught” group. 
Osborn reasoned that, apart from the effects of the experimental treat- 
ment, the pupils in a single school would be relatively homogeneous in 
their attitude toward the issue concerned, and that hence the whole study 
had to be regarded as a sample of schools rather than as a sample of 
pupils. His analysis was therefore concerned with class means rather 
than with individual measures. Since the classes all involved approxi- 
mately the same number of individuals the unweighted means were used. 
These means were paired by schools and the difference found for each 
school. The t-test for a difference between means of paired measures was 
then appropriately used to test the significance of these differences. 
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Simple Analysis of Variance 


So that there may be no confusion in terms, it may be well first to 
describe briefly what we mean by “simple analysis of variance.” Given a 
set of observations which may be classified into groups, the sum of the 
squared deviations of all observations from their general mean may be 
analyzed into two components. One of these is the sum (for all groups) 
of the squared deviations of the observations from their respective group 
means, the other is the weighted sum of the squared deviations of the 
group means from the general mean. Each of these components, divided 
by the appropriate degrees of freedom, will yield an unbiased estimate of 
a population variance, on the assumption that all groups were inde- 
pendently drawn at random from equally variable populations. On the 
further assumption that in these populations the observations are normally 
distributed, one may use the F-ratio between these two estimates of variance 
to test the hypothesis that the population means are equal. Any instance 
in which a test of this type is based on an analysis of variance into two 
components will here be referred to as an instance of “simple” analysis 
of variance, as distinguished from the more complex case in which the 
analysis divides into several components. 

Stuit and Donnelly (51) compared the scores made four years previ- 
ously on various aptitude and entrance proficiency tests by individuals 
graduating from college. In one phase of this study Stuit and Donnelly 
grouped measures of mathematical aptitude for these individuals into nine 
groups, according to the major field subsequently pursued, and applied 
the methods of simple analysis of variance to test the hypothesis that 
the group means were equal. This analysis was repeated four additional 
times, once for each of the three other aptitudes or skills measured and 
once for the composite of the four measures. Except for the fact that the 
assumption of normality was not satisfied—an assumption which is 
usually not crucial—this application seems sound. 

Edgerton and others (13), Evans and Wren (14), Mellens (26), and 
Williamson and Bordin (56) employed analysis of variance when cor- 
relation techniques would have been more appropriate. Evans and 
Wren (14) placed 148 students into four groups on the basis of test 
scores. The grade point averages for these individuals were analyzed so 
as to yield a “between groups” variance and a “within group” variance. 
It was found that the mean grade point averages of these groups varied 
significantly. This study produced quantitative values for both variables, 
and yet coarse categories were imposed on the data in order to use the 
methods of analysis of variance. Such data, in general, are more ap- 
propriately analyzed by correlation procedures. The scheme of correla- 
tion analysis based on the unbiased correlation ratio set forth by Peters 
and VanVoorhis (3la)—the so-called Epsilon technique—is admirably 
suited to the analysis of data of this type. The Epsilon technique has 
the advantage of not only providing a test of significance equivalent to 
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that provided in analysis of variance, but also providing a readily inter- 
pretable estimate of the strength of the relationship. Where the assumption 
of linearity seems justified the ordinary product-moment correlation serves 
the same purpose. 

Part V of the study by Edgerton and others® (13) is of interest for 
another reason also. In one phase of the investigation simple analysis of 
variance was applied twice to the same data but with different groupings 
of the subjects. The variable analyzed was “liberalism” as measured 
by the Progressive Education Association’s “A Seale of Beliefs.” In one 
instance the subjects were grouped into six catagories on the basis of 
the amount of their mothers’ education; in the other the original six 
categories were combined into two coarser categories according to whether 
the mother did or did not attend college. A significant F was found in 
the second analysis but not in the first. This procedure is analogous to 
computing a product-moment r between two variables, and subsequently 
imposing a dichotomy on one variable and computing a biserial r for the 
same data. The effect of grouping errors might be to make the biserial r 
significant and the product-moment r nonsignificant, but the test based 
on the finer categories would ordinarily be considered the more de- 
pendable. For similar reasons the F-test based on the six categories would 
provide the better test of the null hypothesis in this study, and the test 
based on two categories seems redundant and _ pointless. 

Lamson (20), by means of analysis of variance, reached interesting 
and most unexpected conclusions relative to differences between five 
fourth-grade classes in IQ and in educational age. She concluded that 
the five classes differed with high significance insofar as mean IQ’s were 
concerned, but that they did not differ significantly insofar as educa- 
tional age means were concerned. These conclusions’ resulted from a 
rather interesting error. In analyzing the IQ measures, Lamson obtained 
a “within class” variance of 48.92 and a “between classes” variance of 
2.56. Obviously, the means of the IQ measures taken jointly do not 
differ significantly from class to class. The F-test need not, in fact cannot, 
be made since the variance to be tested (the “between classes” variance) 
is smaller than the error variance (the “within class” variance). Lamson, 
nevertheless, determined an F-ratio using the “between classes” variance 
as the denominator or error term and concluded that: “. . . the variation 
in IQ is larger than would be expected in similar samplings ninety-nine 
times out of a hundred as the result of chance factors. The composite group 
lacks homogeneity with reference to intellectual ability” (p. 177). A 
conclusion which might validly have been drawn from this F-test applied 
by Lamson is that some real factor had operated to make the differences 
between class means considerably smaller than would result by chance. 


Only one application of simple analysis of variance to experimental 


® Part V was done by W. A. B. Schrader. 
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data was encountered in the educational periodicals checked. Lohmeyer 
and Ojemann (22) were concerned with the relative effectiveness of three 
methods of auditory presentation. A pre-test was given over the material 
presented and the three methods groups were equated in part with refer- 
ence to this measure. This same test was used again as a final or post- 
treatment test, the criterion measures analyzed being the differences be- 
tween corresponding pre-test and post-test scores. No allowance was made 
for the effect of equating groups, hence a simple analysis in this case 
affords a definitely biased test of the hypothesis that the methods of 
presentation are equally effective. Since the preliminary measures were 
available, an analysis of covariance might have provided an unbiased 
test of this hypothesis. 


The t-Test as Used in Conjunction with an 
Analysis of Variance or of Covariance 


Sometimes, in a simple analysis of variance of a sample consisting of 
several groups, it is desired to test the significance of the difference between 
the means of a particular pair of these groups. In general, it is defensible 
to use the ¢-test for this purpose only after the F-ratio of the “between 
eroups” and “within groups” variances indicates that taken jointly the 
group means differ significantly. The research worker should guard 
against the temptation to apply the ¢-test to certain pairs of means (par- 
ticularly those selected because they show a relatively large difference) 
when the F-test has already shown that the observed variation in the 
means is entirely attributable to chance. This mistake was made in a 
study by Rubin-Rabson (42). Long and Welch (23) and Evans and 
Wren (14) applied the ¢t-test to selected pairs of means before applying 
the over-all F-test. A valid application of the ¢t-test in such situations 
may be found in a study by Lohmeyer and Ojemann (22). An example of 
the ¢-test properly applied in conjunction with an analysis of covariance 
may be found in Spencer’s study (50) dealing with the retention of orally 
presented materials. 


Complex Applications of Analysis of Variance 


The analysis of variance into three components is found in a study by 
Cast (7), which was concerned with the problem of evaluating different 
methods of marking themes. Cast submitted forty themes to twelve judges 
each of whom marked each theme by the same prescribed method. Cast 
obtained three independent estimates of the population variance, namely, 
an estimate based on the differences between themes (V.), an estimate based 
upon the differences between judges (V,). and an estimate based on the 
residual or remaining differences between the 40 x 12 ratings after effects 
due to themes and judges had been eliminated (V,). Reasoning. then, 
that a system of marking which did not differentiate significantly between 
themes was valueless, Cast concluded that a significant F = V/V, is one 
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of the criteria of a good marking technique. He next reasoned that a good 
marking system does not differentiate between judges, and hence a signifi- 
cant F = V,/V, would be indicative of a poor marking system. Finally 
Cast reasoned that a good marking system is one which reduces random 
errors to a minimum, and that, therefore, a significant F = V,/V ¢otq), 
where V tora; is obtained by dividing the total sums of squares by the total 
degrees of freedom, would indicate a poor marking system. In this last 
step he made an error—that of trying to test an F-ratio between variances 
that are not independent of one another. 


Cast repeated this analysis with three other schemes of marking using 
the same themes and the same judges, a period of approximately two 
months separating each marking. He ranked the schemes thus studied on 
the basis of the three F-ratios described—a somewhat questionable pro- 
cedure in view of the lack of any evidence of the comparability of the 
F’s or of the reliability of the ranking. A similar form of analysis may 


be found in a study by Sells, Loftus, and Herbert (49). 


A study by Owens (31) of intra-individual differences versus inter- 
individual differences in motor skills illustrates an unusual variety of 
applications of analysis of variance and other techniques, some of which 
seem as questionable as they are ingenious and interesting. Owens ad- 
ministered each of six tests of motor skills eight times to each of fifteen 
individuals, in an effort to determine the relative magnitude of (a) dif- 
ferences from trait to trait within the same individual and (b) differences 
from individual to individual for the same trait. Among other things, 
he analyzed the variance (T.V.) of scores obtained from seven adminis- 
trations of a test of a motor skill into the “between administration” 
(R.V.), “between individuals” (I.D.), and “remainder” (error) compo- 
nents, and then indicated in terms of percents the contribution of each 
factor to the total variance. In his Table 1, for instance, he stated that 
68 percent of the variance in “block packing” scores is due to differences 
between individuals, 16 percent to differences between administrations, 
and 16 percent to error. The contribution of individual differences was 
apparently obtained by deducting the error mean square from the 
mean square for individual differences, and dividing the result by the 
number of individuals. The contribution of administrations was _pre- 
sumably found by deducting the error mean square from the adminis- 
tration mean square and dividing the result by the number of adminis- 
trations. (Owens does not state specifically what divisor was used.) These 
contributions were presumably then added to the error mean square and 
each expressed as a percent of this total. This procedure is commonly 
used in genetics (49a) to evaluate the contributions of various factors 
to variance, and may find rather wide applications in education. When 
based on a small number of categories, however, a percent estimate of this 
kind is likely to be highly unreliable and should not be interpreted too 
literally. 
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Owens in this fashion determined the percent contribution of individual 
differences to total variance for each of the six tests, and then averaged 
these percents to get an over-all measure of the relative importance of 
individual differences. He similarly analyzed the variance of the scores 
on all tests in all administrations for each individual into “between tests” 
(T.D.), “between administrations” (R.V.), and “error,” and for each 
individual expressed as a percent the contribution of trait differences 
to the total variance involved. He then averaged these percents to secure 
an over-all measure of the relative importance of trait differences. He 
next tested the significance of the difference between these mean _per- 
centages and concluded, because the difference was not significant, that 
trait differences and individual differences are of the same magnitude. One 
cannot conclude, however, that because trait variations and individual 
variations represent equal proportions of different things, that they are 
therefore equal to one another. Individual diflerences were expressed as 
a percent of individual differences plus administration differences plus 
error. Trait differences were expressed as a percent of trait differences 
plus administration differences plus error. No explanation is given of 
how these sums can be considered to represent the same total variance 
(apparently one would have to assume that which is to be proved, i. e. 
that trait variations are equal to individual variations), while the figures 
given show definitely that they are not equal. 

To summarize the analyses of the type first described, Owens totaled 
the sums of squares and degrees of freedom from the separate analyses 
for the six tests, to secure a table (Table 8) in which there are reported 
78 degrees of freedom for “between individuals” and 72 degrees of freedom 
for “between administrations” (repetition). How one can have 78 degrees 
of freedom for “between individuals” in an analysis involving only 15 
individuals or how on any consistent basis the 72 degrees of freedom are 
obtained for differences between administrations. are other mysteries that 
require explanation. Owens similarly summarized (Table 9) the data 
for the second type of analysis for the fifteen individuals. and in each 
summary table applied an F-test on the basis of the degrees of freedom 
thus obtained. 

It should be noted that even though the percents obtained were com- 
parable from the two summary tables, one may not conclude from a non- 
significant difference that trait differences and individual differences are 
of the same magnitude. To do so is to make the common mistake of at- 
tempting to prove a null hypothesis. Owens is guilty of this latter mistake 
at another point also when having tested the homogeneity of a number of 
variances, he says (p. 309) that, “In all cases. the value of L failed to 
reach over the 5 percent-level, which means that the variances within groups 
are the same” (p. 309). 

Owens’ study involved several other complex procedures which, because 
of space limitations, cannot be discussed here. On the whole, his study is 
commendable for ingenuity but is open to criticism because of inadequate 
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reporting of procedures and questions of technical logic. It would seem 
advisable that educational research workers feel their way slowly in be- 
coming acquainted with the possibilities of analysis of variance and 
be content for a while with relatively straightforward interpretations of 
relatively simple applications. 

Evans and Wren (14) stated that they applied “. . . analysis of variance 
technique with two criteria of classification, Thinking I-E (Introversion- 
Extroversion) and Miller Analogies scores...” (p. 51). The trait 
studied was scholastic achievement. Four classes were formed with ref- 
erence to each variable by dividing the distribution of scores into fourths. 
Aside from the fact that partial correlation techniques might have been 
more appropriate, this study is cited because of the apparent pointless- 
ness of one of the tests applied. On page 51, Evans and Wren stated, 
“When the variance within the Analogies . . . groups was considered, 
there was insufficient evidence to determine any difference in the scholastic 
achievement of the four Thinking I-E . . . groups.” This is interpreted by 
the reviewers to mean that the significance of the “between Thinking I-E 
quarters” mean square was tested with reference to the “within Miller 
Analogies quarters” mean square by the F-test. This further illustrates 
the tendency to give inadequate consideration to the terms used in an 
F-test. The denominator in all such tests must be meaningful as an “error” 
term with reference to the numerator—the hypothesis tested being the 
hypothesis that the numerator variance is entirely attributable to chance 
fluctuations of the type measured by the denominator. 

Gabel’s study (17), which was concerned with the relative effects of 
definite and indefinite quantitative terms upon comprehension and _re- 
tention of social studies material, illustrates a higher order analysis. The 
subjects were the pupils in four grades in each of nine different school 
systems. The total sum of squares was analyzed into “between modes of 
presentation,” “between grades,” “between schools,” and all possible 
interaction components. The triple interaction component was used to 
form the error mean square. The only effect tested for significance was 
the “between modes of presentation” effect, all other effects having been 
determined for the sole purpose of eliminating them from the error term. 

In a study of the factors affecting the efficiency of inductive reasoning, 
Long and Welch (23) carried out an analysis of variance with reference 
to the three variables: “subjects,” “degree of abstractness,” and “number 
of antecedents.” They obtained all possible interaction terms and used 
the triple interaction variance as the error term. Both “abstractness” and 
“antecedents” effects were tested and found to be significant. Then, in 
order to determine the relative potency of these two factors, they tested 
the F-ratio between the “abstractness” and “antecedents” variances. Since 
the d.f. were the same for both variances this may have served a practical 
purpose in this particular case; this procedure, however, is generally in- 
valid. A means of estimating the relative contributions of the various 
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factors to the total variance which may be used when an appropriate 
error term is available is that described in the preceding discussion of 
the study by Owens. 

Rubin-Rabson applied the Graeco-Latin square design in a series of seven 
studies (41, 42, 43, 44, 45, 46, 47) dealing with memorization of piano 
music. The second study, which is typical of this series, was concerned with 
a comparison of massed and distributed practice. Nine subjects, nine selec- 
tions, and three methods were involved. Each subject memorized the nine 
selections, each of the three methods being used for a different set of three 
selections. By means of nine “3 x 3” Graeco-Latin squares, Rubin-Rabson 
balanced the three method sequences for all nine subjects and at the same 
time balanced the compositions. Obviously the balancing of the method 
sequences served also to balance the methods. 

The advantage in such situations of the Latin or Graeco-Latin square 
designs is that they permit the use of a method of analysis of variance, 
which recognizes by reducing the error term that the counterbalancing 
increases the precision. (Research workers using other counterbalanced 
designs have often incorrectly analyzed the results just as if simple random 
sampling techniques had been employed.) This advantage, however, is not 
obtained without a price. The balancing of certain of the variables pre- 
cludes the determination of interaction effects, some of which may be of 
as much interest as the main effects themselves. Hence, the use of this tech- 
nique involves the assumption that there are no real interaction effects 
present or other more complex assumptions. In terms of the study under 
discussion this is tantamount to assuming that whatever method works best 
for one selection works best for all selections, or that whatever method 
works best for one subject works best for all subjects, or that whatever 
selection is most readily learned by one subject is also the selection most 
readily learned by the other subjects. It seems unreasonable to suppose 
that such assumptions are closely satisfied in this particular situation, al- 
though they may be nearly enough met so that a pooled error term may 
serve adequately for a rough test of the main effects. Before adopting the 
Latin or Graeco-Latin square designs, therefore, research workers in edu- 
cation should consider carefully the assumption implied, namely, that there 
are no real interactions, and should consider also the possibility of using 
a factorial design that will permit an evaluation of possible interactions. 

Rubin-Rabson, apparently employing a procedure similar to that used by 
Owens, obtained an estimate of the potency of each variable in terms of 
a percent of the total variability, and placed considerable emphasis upon 
this estimate in interpreting her findings. It should be noted that such a 
procedure is valid only when an appropriate error term is used. It is 
doubtful that the error term yielded from the data as they are arranged in 
Rubin-Rabson’s design is appropriate for this purpose. Certainly with the 
small numbers of categories employed, these percent estimates are highly 
unreliable and should be cautiously interpreted. 


515 








REVIEW OF EDUCATIONAL RESEARCH Vol. XII, No. 5 





A word should also be said relative to a variation introduced by Rubin- 
Rabson in the fourth study of this series (44), in which the methods were 
not rotated but were presented to all subjects in the same order. If this 
was really done, the “order” differences would be inextricably mixed or 
“confounded” with the “methods” differences. Rubin-Rabson apparently 
was not aware of this fact, since in analyzing her data she somehow ob- 
tained separate sums of squares for “methods” and for “order.” Appar- 
ently she obtained the same sum of squares twice, in one case calling it 
the “methods” sum of squares and in the other the “order” or “sequence” 
sum of squares. If this is a correct interpretation of her statements, she 
thus deducted the intermingled effects of “methods” and “order” from the 
residual or error term not once but twice. 


Studies Employing Analysis of Covariance 


Most research workers in education are familiar with the “matched 
groups” type of experiment and are aware of the administrative difficulties 
involved in attempts to match groups of school pupils. Since the method 
of analysis of covariance oflers a way of securing the same degree of pre- 
cision without the administrative inconveniences of matching, it is sur- 
prising that this technique has not been more widely adopted by educa- 
tional research workers. However, the reviewers were able to find only one 
published study in which this procedure had been employed. This is a study 
by Spencer (50), concerned with the retention of orally presented mate- 
rials. In one phase of this study Spencer tried seven combinations of dif- 
ferent frequencies and temporal spacings of the administrations of a recall 
test to pupils to whom certain expository materials had been presented 
orally. The seven experimental groups involved (one for each combina- 
tion) were each composed of eight school classes. Since the classes were 
of approximately the same size, Spencer dealt with the unweighted class 
means as individual measures. The subjects were given an initial test of 
learning ability and a final criterion test after a lapse of sixty-three days, 
during which the various experimental procedures were administered. By 
standard procedures of analysis of covariance, Spencer analyzed these data 
so as to obtain (a) a “between groups” variance adjusted for differences 
between the groups in learning ability as revealed by the initial test, and 
(b) an appropriate error variance. Hence, by means of the F-ratio he was 
able to test the hypothesis that learning ability being constant, the experi- 
mental procedures are equally effective. The difficulties met in the ordinary 
matched groups experiment are considerable, but those which would be 
involved in an attempt to make up two groups of intact school classes 
would be almost insurmountable. The method of analysis of covariance 
permitted the equivalent of such an experiment with a minimum of ad- 
ministrative difficulties. 
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Conclusion: Need for More Complete Reporting 


While examining the studies cited in this chapter, as well as many others 
not mentioned, the reviewers noted again and again the sketchy and inade- 
quate manner in which the procedures used were described and the findings 
reported. In many instances it was utterly impossible to infer from the 
writer's description what statistical procedures had been employed, or, 
where the procedures could be identified, to check in any satisfactory way 
the accuracy of their application. The present review indicates a real need 
for more enlightened application in education of the newer statistical tech- 
niques, but, in the opinion of the reviewers, there is an even greater need 
for accurate, unambiguous, complete, and meaningful reporting. One need 
not present all the original data or reproduce all computations in a research 
report, but unless studies are at least so reported as to enable the reader 
to decide for himself whether or not an appropriate method of analysis 
was employed, or to check the crucial steps in the application of the method 
used (appropriate or inappropriate), the findings have no scientific value. 
In justice to the research workers it is only fair to say that the editors of 
the research journals may in part be responsible for this situation, through 
always urging brevity. They, at any rate, are in position to require the im- 
provement that is so seriously needed. 
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CHAPTER V 


Evaluative Studies 


ALVIN C. EURICH, C. ROBERT PACE, and EDWIN ZIEGFELD 
Scope of Chapter 


Evavvative STUDIES IN EDUCATION are numerous and varied. They range 
all the way from casual and informal appraisals that can hardly be dig- 
nified as studies or research (1, 26) to elaborate investigations requiring 
years of time by relatively large staffs richly endowed with foundation 
support (2). No attempt is made in this chapter to review all studies that 
come within this range. That would require an entire issue of the REvIEw 
rather than one chapter. The scope, therefore, is limited to a discussion 
of the over-all evaluations of entire educational programs or institutions. 
Studies involving an evaluation of schools by pupils and by outside or 
accrediting agencies have been extensively explored during the past decade. 
The chapter is limited, furthermore, to evaluative studies of secondary and 
higher education. Because practically all children who complete the ele- 
mentary grades now go on to high school, the few follow-up investigations 
that have been made at this level are primarily concerned with achievement 
in secondary schools. Although this phase is important these restrictions 
prevent investigating acceptance of responsibility and community living. 
Clark and others (6) in 1940 summarized studies on the social effectiveness 
of education in such areas as financial and vocational success and school 
subjects. 


From Measurement to Evaluation 


Marked changes have taken place during the past few years in the nature 
of the studies designed to evaluate education (40, 41, 44, 60). The most 
apparent and, perhaps, the most significant is the broader scope of these 
investigations. Whereas previously the emphasis was upon measurement of 
single aspects of achievement, discrete behavior traits, or specific abilities, 
now more and more stress is placed on behavior patterns or appraisal 
of the educational program in terms of all the major objectives it seeks to 
attain (32). 

Two major reasons can be cited for this change. In the first place, as 
measurement techniques improve or as they are available in wider variety, 
it becomes possible to measure larger and more complex aspects of be- 
havior and to judge programs in terms of more comprehensive criteria. 
In the second place, the prevalent conception of education has changed 
from one predominantly concerned with the development of specific types 
of achievement as in spelling, knowledge of American history, or physics, 
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to one concerned with the development of total personalities—whatever 
that might mean scientifically. Psychologically, stress on interaction now 
supplements that placed upon reaction. This has led to broader programs 
of evaluation as described by Eurich (16, 17), Foster and Wilson (22), 
Lorge (33), Raths (47, 48, 49), Smith and Tyler (51), Troyer (55), 
Tyler (57), and Wrightstone (61, 62). 


Purposes of Evaluation 


Evaluative studies are undertaken for a variety of purposes. Some of 
the most important are: 


1. To check on the effectiveness of educational institutions in terms of behavior 
changes in or achievements of students—Most evaluation in educational institutions 
has been concerned with testing or with other methods of appraising the knowledge 
and skill of individuals gained in courses (36). Educators have generally assumed 
that desirable behavior or action will follow knowledge and skill. In most courses 
the achievements of students, as measured by tests and examinations, are the imme- 
diate ones—those apparent after a short period of rather intensive training. In con- 
trast, the tests developed by the Cooperative Test Service (9), and by the Eight-Year 
Study of the Progressive Education Association (51) are not limited in their use to 
specific courses. Although the effectiveness of a program can be measured in part 
by immediate results, the fact that much education is designed for future use indicates 
the need for an evaluation of student development over a longer period of time. After 
leaving school, individuals are subjected to many situations. Their abilities to make 
the necessary adaptations in after-school life and to achieve in relation to their 
capacities constitute acid tests for an educational program. Consequently, there is 
the need for follow-up studies which are broad in scope and not limited, as were 
many of the earlier appraisals, to financial and vocational success. 

2. To plan future educational programs and procedures—This purpose follows the 
first. Most evaluative studies are being made to provide a basis for improving the 
program (25, 30, 37, 54). Again, to serve this purpose the studies must be sufficiently 
broad to affect not only the administrative policies of the institution but curriculum 
policies and the counseling of the individual student as well. In short, they need to 
cover the total program. 

3. To accredit institutions—With the development of a broader conception of edu- 
cation, considerable dissatisfaction arose with accrediting procedures concerned pri- 
marily with the financial resources, the number of books in the library, and the 
training and experience of teachers. The evaluative studies that are setting the new 
patterns for accrediting procedures are likewise concerned with all the major objec- 
tives of the institution (7, 42). They follow, therefore, the same trend as other 
studies. 


Methods Used in Evaluative Studies 


Most of the methods used in evaluative studies are given full treatment 
elsewhere in this issue. By far the commonest method in follow-up studies 
is through the use of questionnaires. These have proved to be the most 
flexible and economical, though by no means the most reliable, method for 
securing evaluative information. Rating scales are at times incorporated in 
or supplement the questionnaires. Analysis of records gives information on 
the administration of the institution and on the development of individuals 
over a period of time. Tests are perhaps the commonest evaluative measure 


522 











nn 








December 1942 EVALUATIVE STUDIES 





applied to students in school and can be constructed to measure a wide 
variety of traits and behaviors. Direct observation furnishes data on some 
outcomes of education to an extent not possible by any of the above meth- 
ods. Interviews provide another method of investigation as do reports by 
the subjects. These last three, properly used, can encompass more dimen- 
sions of thinking and acting than the first four, but the results do not 
lend themselves to statistical treatment. 

The important point about methods in evaluative studies is that a wide 
variety must be used if a comprehensive evaluation is to be made. Each 
has its advantages and limitations, each complements and supplements the 
others, each is subject to misuse and misinterpretation. As new values in 
education are defined, as new needs arise, new techniques must be devised 
to appraise aspects of the educational program in order better to evaluate 
the whole. 


Descriptions of Evaluative Studies at the College Level 


Outstanding comparative studies of institutions include the extensive 
work of the North Central Association (42) and the Carnegie Foundation 
study of schools in Pennsylvania (31). The latter was based primarily on 
achievement examinations in the major subjectmatter divisions of college 
programs. 

Prior to 1941 most follow-up appraisals of college programs were con- 
cerned only with the very limited objectives of economic and occupational 
success (14, 15, 19, 23, 29, 34, 38, 52). The most recent and extensive 
of a long line of studies of this character is the one by Babcock (3). Its 
unique contribution lies in the fact that the data were drawn from a sci- 
entific sample, based on the Fortune poll technique, of the status of all 
living United States college graduates. It provides the most complete and 
accurate data available on the economic and occupational status of college- 
trained men and women. 

Follow-up studies concerned with a more complete range of college ob- 
jectives have been relatively few, but undoubtedly their number will in- 
crease. The most extensive, in terms of the range of objectives covered, is 
that conducted by the General College of the University of Minnesota, 
reported by Pace (45). The General College was revising its curriculum 
to bring it more nearly in line with the characteristics of its students and 
the adults they were likely to become. A 52-page questionnaire was sent 
to a sample of former Minnesota students, both graduates and nongradu- 
ates, out of college from one to twelve years. The questionnaire covered 
activities, problems, attitudes, and interests in four areas of living—voca- 
tional, home and family, socio-civic, and personal. Attractively printed and 
profusely illustrated, the questionnaire drew returns from 70 percent of 
the former students who received it. In the questionnaire were standardized 
scales to measure job satisfaction, economic status, cultural status, liberal- 
conservative attitudes, general adjustment, and morale. All the items were 
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of the checklist type. Supplemental interviews were held with 172 of the 
951 questionnaire respondents. 

Extensive comparisons between the responses of graduates and non- 
graduates provided the basis for judgments concerning the effectiveness of 
college education. Further judgments about the strengths and weaknesses 
of the college program were based on the extent to which the typical pattern 
of activities, interests, and attitudes of the college-trained adults corre- 
sponded to the kinds of behavior which the college faculty believed should 
characterize college-trained adults. For example, the responses indicated 
that these adults held inconsistent attitudes toward related social problems; 
that many of them were interested in effecting home economies, yet they 
followed many uneconomical practices; that they were interested in broad 
national problems but not in local and community affairs. Results such as 
these were interpreted by the faculty as reflections—at least in part—on 
the inadequacy of the college program. The basic interpretative problem 
of follow-up studies of this kind lies in this question: To what extent are 
we justified in praising or blaming the college for the activities, interests, 
and attitudes of young people who, in this study, have been out of college 
from one to twelve years? 

In the General College study some indirect evidence concerning the re- 
liability and validity of responses was available. Approximately 7 pages of 
the questionnaire consisted of standardized scales of known reliability. 
These reliabilities centered in the .80’s. Another 2 or 3 pages con- 
sisted of straightforward factual questions, such as: What is your job? 
How many children do you have? Approximately 10 pages were composed 
of questions regarding activities, such as: Did you plan expenditures on 
a budget? Did you make any articles of clothing during the past year? 
Did you vote? Did you have your teeth examined? One of the reasons 
for including so many simple questions of this “yes or no” type was the 
staff's belief that adults’ answers to them would be straightforward and 
dependable. Not much evidence was available concerning the trustworthi- 
ness of responses to the 16 pages of items dealing with attitudes, opinions, 
enjoyments, and degrees of interest and participation. Some of the items, 
however, had been used previously in studies of General College students 
and their parents (10), and cross comparisons among the various groups 
suggested that the differences in responses were in line with what one would 
expect. 

The questionnaire contained 7 or 8 pages of items designed to probe 
adults’ need for more information. The responses to these questions were 
difficult to evaluate. Interviews with a sample of the questionnaire respond- 
ents indicated that among the men a majority had checked these items be- 
cause they had actually experienced a need for the information, whereas 
among the women a majority had checked the items because they felt 
they ought to know more about them. The staff professed least confidence 
in the answers given to these questions. Many questions of this sort were 
employed by Fenlason and Sletto (20, 21) in a questionnaire follow-up 
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study of social case workers in Minnesota. In a section on social case work 
techniques, a checklist of thirty-seven techniques was preceded by the 
question: “Do you feel an urgent need for additional knowledge of this 
technique in the satisfactory performance of your work?” The focus of 
the inquiry upon a job analysis and the phrasing of the question in terms 
of an urgent need provide a more appropriate setting for this type of item 
than occurred in the General College study. 

Other colleges have attempted to appraise the effectiveness of their edu- 
cational programs by sending questionnaires to alumni which solicit di- 
rectly their opinions concerning the values of their college experiences. 
The follow-up questionnaire sent to former Bennington College students 
(18) and the Stanford follow-up inquiry by Isle (27) are examples. One 
item included a list of fifty features such as: “1. individual conferences 
with counselors; 2. social life at the college; 3. system of trial majors; 4. 
housing facilities.” The alumni rated each of these features as “very satis- 
factory,” “fairly satisfactory,” “neutral,” “rather unsatisfactory,” or “very 
unsatisfactory.” Free response or essay questions were also used. For ex- 
ample, “List in order of importance the experiences, courses, or instructors 
that, in your judgment, made the greatest contribution to your development 
while at Bennington.” “What defects were there in your college work 
as you see it now?” The Stanford follow-up questionnaire was designed for 
alumni who were engaged in teaching or in other educational work. Rating 
scales, checklists, and free response items were also included. 

The chief interpretative problem in this type of questionnaire lies in the 
extent to which alumni can introspect reliably about the values of their 
college experiences. It is likely that old graduates develop a halo about 
the values of their education which tends to make their judgments unre- 
liable. The purpose of the investigation may likewise influence results. 
Criticisms on both these points can be made of the work of Tunis (56) and 
Rogers (49). Both of these studies included old graduates, and both studies 
were made to celebrate college anniversaries: the former to commemorate 
the 25th reunion of a Harvard class, and the latter to celebrate the 75th 
anniversary of Vassar College. Greater faith in the results is justified when 
the subjects are recent graduates and when the purpose of the investiga- 
tion is frankly to evaluate and improve the college educational program. 
Recent graduates experiencing the demand of new tasks may have valuable 
insights to contribute to the college faculty regarding the strengths and 
shortcomings of the instructional program. Isle (27) reported that Stanford 
graduates were much more critical of the Stanford program than were their 
employers. 

The follow-up study, as a means of appraising the effectiveness of edu- 
cation, is used most appropriately as but one phase of a larger pattern of 
evaluation of a college program. The complete evaluative studies of Ben- 
nington College, the General College of the University of Minnesota (12), 
and Stanford University’s School of Education (35) used standardized’ 
achievement tests, numerous questionnaires, checklists, interviews, and rat- 
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ing scales with students and faculty in college as well as a follow-up of 
graduates. 


Evaluative Studies at the Secondary-School Level 


Four major evaluations using follow-up studies as part of the total pat- 
tern of appraisal may be cited—the American Youth Commission, the 
Cooperative Study of Secondary School Standards, the New York Regents’ 
Inquiry, and the Progressive Education Association Eight-Year Study. 

Stimulated by the youth problem of the depression years, Bell’s study 
(4) for the American Youth Commission was a pioneer attempt to deter- 
mine and analyze the status of a representative sample of Maryland youth, 
ages sixteen to twenty-four. Interviews were held with more than 13,000 
young people, sampled on the basis of sex, race, school status, job status, 
social and economic status, marital status, type of community, and other 
relevant factors. The study was concerned with youth at school, home, at 
work, play, and church. In appraising their schooling, three-fourths of the 
total group said they had received no vocational guidance (yet economic 
security was their most pressing problem), 27 percent attributed little or 
no economic value to their schooling, and 12 percent attributed little or 
no cultural value to it. The amount of schooling youth received and their 
appraisal of its value were clearly related to the economic and occupational 
status of their parents. 

The follow-up phase of the Eight-Year Study has been reported by Cham- 
berlin and others (5). Graduates from thirty experimental high schools 
were matched with graduates of traditional high schools on factors of 
scholastic aptitude, sex, race, age, religious affiliation, size and type of 
high school, size and type and location of community, socio-economic status 
of family, and extracurriculum activities in high school. The subsequent 
success in college of both groups was compared. College records and re- 
ports, special questionnaires and tests, and personal interviews were used 
to gather evidence with respect to nine aspects of college success: intel- 
lectual competence, cultural development, practical competence, philoso- 
phy of life, character traits, emotional balances, social fitness, sensitivity 
to social problems, and physical fitness. In many of these areas or aspects 
of competence the mass of data for each student was summarized by means 
of judges’ estimates of its correspondence to briefly described behavior 
levels or types. For example, to get a judgment on the extent of students’ 
interest in current affairs each case was rated to fit one of the following 
five categories (5, p. 13): 

1. Matters of social, economic, political, and humanitarian significance command 
his interest and objective study. Does something about it. Membership, writing, con- 
tributing, agitating. 

2. Considerable reading and discussion of these matters. Many matters of social, 
economic, political, and humanitarian significance command his interest. May or may 
not do anything about it. 


3. Somewhat limited or inconstant interest in many phases of these matters. Usually 
aware of them. 
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4, Limited to certain phases, or sporadic only. Little attempt to keep up with what 
is going on. 
5. Not interested at all. 


This rating technique has been used previously and comprehensively by 
Darley and Williams (11) in analyzing the case records of one hundred 
students of Minnesota’s General College and their parents. Williamson and 
Bordin (58, 59) used a similar method of rating materials from case 
records in evaluating the success of student counseling. 

Other features of the graduate follow-up in the Eight-Year Study are 
noteworthy. The staff selected graduates from the six schools whose pro- 
grams were judged to have departed most significantly from traditional 
practices, and then from the two most experimental schools, and compared 
them with matched students from traditional schools. In these comparisons 
differences between experimental and control groups were successively 
greater than the ones revealed between the total experimental and control 
groups. Simple and common sense methods of analysis such as these are 
highly appropriate for evaluative data which, at the present level of devel- 
opment, are not precise in any mathematical sense. Final value judgments 
of the success, goodness, or effectiveness of educative experiences are and 
should be based on the main trends and major patterns of results from 
a variety of data. 

Follow-up appraisals in the Regents’ Inquiry are reported by Eckert and 
Marshall (13). A battery of tests of information, skills, aptitudes, and 
attitudes was given to pupils about to leave high school. Further data were 
obtained from questionnaires and from school principals after the pupils’ 
withdrawal. Analysis of the traits and attitudes, patterns of interest, plans 
for the future, and present school and work activities thus obtained pro- 
vides evidences of the social competence of leaving pupils, and by impli- 
cation, of the success of the schools’ programs. Subsequent interviews with 
the pupils, principals, and a third party, held several months after with- 
drawals, gave further data for judgment on the vocational, social, and 
leisure-time adjustment of former pupils. 

In the Cooperative Study of Secondary School Standards (7) a seven- 
point program of appraisal was employed in judging two hundred second- 
ary schools: use of the Evaluative Criteria (8), judgments of field com- 
mittees, progress of pupils as measured by standard tests, college success 
of pupils, noncollege success of pupils, judgments of pupils, judgments 
of parents. College success was measured solely in terms of academic suc- 
cess. Noncollege success was judged from responses of former pupils to a 
questionnaire which called for ratings on reasons for leaving school, help- 
fulness of school in vocational placement and progress, contribution of 
school in developing various appreciations and interests, and general com- 
ments on the satisfactoriness of high-school experience. Judgments of pupils 
in school were likewise obtained from questionnaires combining rating scale 
and essay type items. Parents’ judgments were solicited by questionnaire 
rating scales with respect to their degree of satisfaction with twelve aspects 
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or goals of the school’s program—such as good citizenship, social life, 
educational guidance, and reading habits. 
Several other evaluations of schools by students, parents, and laymen 


have been made (24, 28, 39, 43, 46, 50). 


Contributions in Method: Approaches to Evaluation 


The studies reviewed here are the most comprehensive. Many other 
studies could have been cited that used one or more of the wide variety 
of methods employed in the more comprehensive investigations. The con- 
tributions to methods made through these extensive investigations have 
been general rather than detailed. No new techniques were developed in 
any of these major investigations, but new ways of adapting and applying 
them were developed. The methods had all been used and tried in previous 
but more limited studies. For example, the institutional pattern map as 
used in the North Central Association studies had been used previously in 
a less extensive study of New York colleges. In essence the map was an 
adaptation to institutions of the profile charts for individuals, and the 
data for each variable on the map were gathered through common tech- 
niques, such as tests, questionnaires, records, reports, and interviews. Some 
major contributions, however, were made in adapting old methods to the 
comprehensive problems investigated and in setting a pattern for broad 
approaches to evaluation. These contributions are: 


1. In refinement of methods—For example, the 52-page questionnaire 
used in the Minnesota study represents a high point in the use of this 
technique. Great care was taken in formulating each item in order to 
eliminate ambiguities. The questionnaire was tried out in preliminary form 
before it was printed with attractive layouts and many illustrations. In these 
ways a method that had been used extensively was refined for this particu- 
lar study. Similarly, in other investigations the older methods were ex- 
tended and refined in application. 

2. In the use of a wider variety of methods—In the more comprehensive 
evaluations, a wider variety of methods has been used than was common 
in the earlier studies. In the Eight-Year Study, for example, interviews, 
questionnaires, rating scales, records, tests and examinations, and analysis 
of curriculums were all used. The best of the methods developed in previous 
investigations were applied in a single study in an effort to arrive at a more 
comprehensive evaluation. 

3. In analyses of major educational objectives as the basis for final 
evaluation—In practically all the major investigations reviewed in this 
chapter, emphasis was placed upon the analysis of objectives as a basis 
for determining what data should be collected. In the Bennington study. 
college records, founders of the college, students, faculty, and trustees all 
contributed to an analysis of the objectives of the colleges. After these 
objectives were defined through extensive deliberations they provided the 
basis for collecting information through questionnaires, tests, rating scales, 
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records, and interviews. Briefly, this procedure, upon which the major 
studies agree, means that a sound evaluation of an educational program 
can be made only in terms of the major objectives that program is set 
up to attain. 

4. In following up individuals over a relatively long period of time— 
The evaluative studies of recent years have sought to correct a deficiency 
in previous studies by following up individuals over a relatively long 
period of time. The Pennsylvania study (31) established a new pattern 
in this regard. Likewise the Eight-Year Study by its title designates a 
follow-up period. The spot evaluations of the earlier studies made over 
a brief interval were clearly inadequate because they did not reveal 
the contributions of the educational program to after-school life. As long 
as such was the case, institutions could make all kinds of unfounded 
claims for their programs. If follow-up studies of this type continue to 
be made the time may come when such claims are regarded as no more 
significant than the tales of the miraculous healing qualities of herbs 
peddled by the old medicine man. 

More long-period follow-up studies must be made before more adequate 
appraisals of education will be available. This will mean fewer but more 
significant evaluative studies. 


Contributions in Terms of Results 


1. The contributions of education are clearly disappointing if viewed 
from the standpoint of the claims made for it. The Pennsylvania study 
(31) discovered unbelievably low accomplishments in many students 
who had the advantages of a college education. In the New York Re- 
gents’ Inquiry, Spaulding (53) found that: 


Among the boys and girls leaving school each year are a considerable number whom 
the schools themselves are unwilling to recommend for responsible citizenship. (p. 17) 


Irrespective of the schools’ judgment of their readiness for citizenship, the leaving 
pupils as a group are seriously deficient in their knowledge of the problems, the 
issues, and the presentday facts with which American citizens should be concerned. 
(p. 18-19) 


The boys and girls who are on the point of leaving school, whatever they may think 
about the desirability of certain kinds of action, are reluctant to assume responsibility 
for civic cooperation, or to commit themselves to action which will involve personal 
effort or sacrifice. (p. 24) 


Once he is out of school, the ordinary boy or girl does practically nothing to add 
to his readiness for citizenship, nor does he even keep alive the knowledge of civic 
affairs or the interest in social problems which he may have had when he finished 
his schooling. (p. 27) 


In the Minnesota study, Pace (45) found that: 


The graduates were distinguished from the nongraduates chiefly by the fact that 
they were more likely to have professional jobs, a littke—but not much—more income, 
and somewhat greater satisfaction with their jobs. In other areas of living there were 
few differences, or none at all, between graduates and nongraduates. (p. 51) 
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Briefly then, the results of these follow-up studies show that education 
has not been as effective as claimed by those responsible for carrying it on. 

2. Broader evaluations have been made possible—This might be ex- 
pected from the application of a wide variety of evaluative methods 
growing out of an analysis of objectives. These broader evaluations have, 
in turn, made possible better solutions to practical problems such as 
those involved in accrediting. The pattern map of the North Central 
Association and the Evaluative Criteria of the Cooperative Secondary 
School Study are examples. 

3. An array of judgments of relatively mature individuals on_ the 
effectiveness of the educational process has been provided—For the most 
part these judgments give a more encouraging picture of the contribution 
that schools and colleges have made to the lives of individuals than do 
test results or analyses of the civic or social life of former students. 

4. Perhaps the major contribution is that derived by the institutions 
which conduct then—lf the study is made with all groups participating, 
as was the case in the Bennington study, in which students, faculty, and 
trustees all took part in a cooperative undertaking, the effect upon the 
institution is a major one. Thus, evaluative studies can and do form an 
important step in the development of the program at a given institution. 


Major Criticisms of Evaluative Studies 


Major criticisms of evaluative studies which need to be guarded 
against in future investigations are: 


1. The instruments used are not adequately appraised. The question- 
naires, for example, are in many cases elaborate but their validity and the re- 
liability of results are too infrequently established. The refinement of 
instruments may be only an elaboration and may not actually contribute 
to the accuracy and dependability of results. 

2. The direct contribution of the schools and colleges is not isolated 
from contributions of other experiences. Because the individual is the 
product of all his experiences and his native endowment, it is practically 
impossible to isolate the contributions of the educational program to his 
development from the contributions of other experiences. In follow-up 
studies it is desirable that the investigation be extended over as long a 
period as possible. The longer the period, however, the more difficult it 
becomes to determine the effects of the educational program because many 
other experiences intervene. From the standpoint of the individual it is 
probably not important to isolate the contributions of school or college 
experiences. From the standpoint of evaluating the educational program 
it becomes very important to do so. The “control technique” as used in 
the Eight-Year Study is the only method used in the follow-up investi- 
gations for isolating, at least in part, the contributions of one form of 
education over another. 


530 














he 
st 
on 


Jo 


ns 
io, 
nd 
he 
an 


ed 











December 1942 EVALUATIVE STUDIES 





3. Elaborateness of studies is discouraging to many school systems and 
institutions that desire to make an over-all evaluation of their programs. 
No simple and inexpensive technique has as yet been devised nor is one 
likely to be devised that will provide an evaluation of an entire educa- 
tional program. For this reason many schools and colleges feel they cannot 
make an evaluation of their program. Every institution, however, has an 
evaluation process going on at all times whether or not it is recognized 
as such. The comprehensive investigations which have been undertaken 
primarily as research projects provide many valuable suggestions. Al- 
though simpler methods may need to be devised for ordinary use this 
criticism need not be taken too seriously at this time. It is mentioned 
here primarily because many institutions have used it as a rationalization 
for not making a more concerted effort to evaluate their programs. 

4. Evaluative studies do not reveal contributions to the development of 
the individual. Practically all the evaluative studies summarize data for 
groups. Case studies have been made but are difficult to interpret. In ex- 
tending research studies there is considerable need for making more ex- 
tensive case studies over a long period of time. 


In spite of these criticisms, considerable progress has been made dur- 
ing the past decade in extending the scope of evaluation studies. More 
and more school systems, colleges, and universities are planning over-all 
evaluation programs in terms of their major objectives. The question 
usually arises as to who should carry on the process. Clearly if the pur- 
pose of evaluation is to provide a basis for improving the educational 
program, those responsible for this program should take an active part 
in the evaluation. To be sure, competent technical direction is needed. 
When provision is made for such direction the institution, through a co- 
operative undertaking, may make further contributions to the methods for 
carrying on evaluative studies as well as to the effectiveness of its own 
educational program. 
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CHAPTER VI 


Questionnaires, Interviews, Personality Schedules 


FRANK W. HUBBARD 


The Questionnaire 


In Chapter IX of the December 1939 issue of the Review the major 
criticisms of the questionnaire technique were pointed out. Recent pro- 
fessional literature reveals, however, that in spite of the advice given in 
this and other research publications, the common mistakes continue to 
appear with some frequency. There has been some tendency to restrict 
the use of the term “questionnaire” to forms that require statistical and 
objective replies; the terms “opinionaire” and “expressionaire” have been 
used by workers at the Character Education Institute of Washington Uni- 
versity (15) on forms calling for subjective and qualitative answers. 

Relatively few research studies have appeared since 1939 dealing di- 
rectly and experimentally with the questionnaire as a method of in- 
vestigation. A number of recent books and articles have, however, treated 
the methods, errors, and limitations of the questionnaire technique. Par- 
ticularly useful are the chapters by Koos (14), Lindquist (19), Lundberg 
(20), Toops (41), Wert (42), and Young (44). Magazine articles with 
helpful suggestions have been prepared by Jenkins (11, 12) and 
Phillips (27). 


Preparing the Questionnaire 


W ording—The use of the questionnaire technique in surveys of opinions 
has led to greater attention to the wording of questions. Cantril (7) 
and Rugg (32), from experience with the Princeton Public Opinion Re- 
search Projects, reported that the use of the names of prominent men 
(such as Roosevelt or Hitler) definitely influenced the responses to 
specific political questions. Blankenship (3) pointed out the existence 
of “danger words” characterized by emotional appeal, ambiguity, and too 
high vocabulary level. Jenkins (11) suggested that there are four major 
ways in which questions may reduce the dependability of the answers: 
(a) by predetermining the answer thru the use of leading or “loaded” 
questions or thru the improper order of the questions; (b) by the use 
of ambiguous terms and vague questions; (c) by exceeding the ability 
of the respondent to use unfamiliar words or to deal with complex ques- 
tions; and (d) by inviting inaccurate responses. Ghiselli (9) found that 
respondents were more willing to reply and that their replies were gen- 
erally more satisfactory when they were allowed to qualify their opinions 
than when they were forced merely to agree or disagree with a fixed 
statement. 
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Pretesting—Both Blankenship (4) and Sletto (35) reported that pre- 
testing the questionnaire with small groups insures greater reliability 
in the answers and increases the probability of a satisfactory return. 


Questionnaire Administration: Sampling; Follow-Up 


Sampling—Lindquist (18) called attention to the fallacy of using 
merely a large number of individual replies from a few schools when 
in reality the school is the unit. He recommends “stratified sampling” 
or the selection of cases from each identifiable subgroup within the 
total population under investigation. Both Reid (28) and Stanton (36) 
showed that those responding to mail surveys are not necessarily rep- 
resentative of the nonrespondents. In both studies having to do with 
the ownership of radio equipment it was shown that the “haves” are 
more likely to reply than the “have nots.” Follow-up procedures are recom- 
mended to increase the proportion of the returns and thereby to reduce 
the possibilities of bias. Suchman and McCandless (37) found thru further 
mailed questionnaires and telephone interviews that the interest of the 
recipient in the topic under consideration and the amount of his educa- 
tion affected the return. Pace (26), in attempting to determine the prob- 
able direction of the bias in a study of college alumni, reported that gradua- 
tion from the university and the number of quarters of university work 
completed were important factors in influencing the returns. 

In studying the flying habits of the patrons of airlines Rollins (30) 
found that a follow-up questionnaire gave a truer picture than a single 
inquiry and that double inquiries to a small list yielded greater returns 
than a single inquiry to a larger list. Shuttleworth’s investigation (34) of 
the employment status of majors in technology confirmed the possibility 
of large sampling errors from incomplete returns. Returns from the original 
inquiry showed only 0.5 percent unemployed; replies from duplicate 
questionnaires used as a follow-up raised the percent of unemployment 
to 6.6; then a final drive, resulting in almost complete coverage of the 
group, brought the percent of unemployment down to 4.0 percent. 

Increasing the returns—Moore (23) varied his procedures in submitting 
a questionnaire to superintendents of schools. He found that typewritten 
letters of transmittal brought 16 percent more responses than did dupli- 
cated letters. Follow-up letters produced a further 16 percent increase 
in the number of replies. 


Reliability of Questionnaire Returns 


Lewis (17) investigated the consistency of the replies of 216 teachers 
when asked to respond on two different occasions to the same question- 
naire. He reported that in the second reply more than half of the teachers 
varied their responses on more than half of the items. In the most con- 
sistent report discrepancies were found with respect to only 16 percent 
of the items; in other cases, the responses varied on as high as 96 percent 
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of the items. Lentz (15), using a social science “opinionaire,” found changes 
on specific questions occurring with amazing frequency. Such variation 
would be expected, however, in responses to questions dealing with opin- 
ions. He concluded that the summation of reactions on a number of items 
was reliable. Neprash (24) studied the statistical reliability of question- 
naire returns, reporting a 20 percent unreliability of responses to specific 
questions on social attitudes and opinions. These studies indicate the im- 
portance of proper questionnaire procedure and the limitations of certain 
types of replies. 


Special Uses of Questionnaires 


Byler (6) reported the use of inquiry forms in ascertaining the interests 
and needs of students prior to a program of curriculum revision. Lewis 
(16) found the questionnaire procedure effective in calling the attention 
of pupils to certain problems of pupil conduct, study hall conditions, etc. 
When followed by discussion in homerooms the results were apparently 
more satisfactory than the “preaching” plan so often followed in high 
schools. Gilkinson and Knower (10) described a multiple-choice guidance 
questionnaire for students of speech, Cureton (8) recommended the ques- 
tionnaire (and other procedures such as the interview) in promoting group 
thinking. Outland and Jones (25) described the application of the 
questionnaire to curriculum appraisal. They warned of the limitations of 
the procedure, however, since pupils usually are interested in immediate 
problems more than in remote matters even tho the latter are vital. 


The Interview 


Few research studies have appeared recently with respect to the inter- 
view. Symonds (38), in an article on research in this field, cited only four 
studies during the entire period from 1926 to 1934. Based chiefly on ex- 
perience rather than on the results of controlled studies, a number of gen- 
eral discussions of the interview technique are available. Among these are 
articles by Aldrich (1), Johnson (13), Symonds (39), and Wilkins and 
Kennedy (43). One usually finds in these discussions considerable emphasis 
on the importance of “rapport” with the person interviewed; the necessity 
of privacy; the wisdom of making systematic records after the conference; 
and the advisability of verifying data and using caution in making interpre- 
tations. The inquiry forms devised for a personnel research study in the 
General College of the University of Minnesota (22) have been termed “a 
contribution to the tools of social inquiry.” 

Schellhammer (33) suggested three ways of compensating for variability 
due to personal elements in an interview: (a) arrange several interviews 
per interviewee, each conducted by a different consultant; (b) have each 
interview conducted by a series of experts, each concerned with the phase 
related to his own field of specialization; or (c) have several consultants 
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sit as a committee to conduct each interview. He also recommended that 
each interview be focused on a single, clearly defined purpose. 


Recording the Interview 


Symonds (40) attempted to discover experimentally the amount and 
types of material forgotten in various periods between the interview and 
its recording. He concluded that records made immediately after the inter- 
view contained the maximum of details. Most significant parts of the inter- 
view, however, were usually not forgotten even tho not immediately re- 
corded. The lapse of a reasonable interval of time resulted in dropping 
relatively unimportant details and may have favored integration of sig- 
nificant facts. 


Vocabulary of the Interview 


Many of the studies that apply to the formulation of questions in ques- 
tionnaires apply equally to the interview procedure. Roslow and others 
(31) reported on the basis of field studies of the Psychological Corporation 
that the use of stereotypes or emotionally charged words may produce 
marked changes in responses; also, that the responses to alternatives in 
checklist questions are influenced by the number and the completeness of 
the alternatives. Young (45) called attention to the differences in meanings 
for different persons of such common terms as “housing,” “unemploy- 
ment,” and “cost-of-living.” Her recommendations as to careful organiza- 
tion of the blank, recognition of the background of respondents, use of 
simple words, and avoidance of emotionally charged words are similar 
to the proposals usually made with respect to questionnaires. 


Special Uses of the Interview 


As is true of the questionnaire, the interview is being used more widely 
in administration and instruction. For example, Anderson (2) suggested 
that pupils be encouraged to prepare “imaginary interviews” with the 
characters of history since the method requires familiarity with facts about 
each character studied. Robinson (29) suggested the interview as a way 
to discover administrative problems and to receive suggestions from teach- 
ers with respect to their solution. Merrill (21) advocated the use of the 
interview in obtaining material for high-school papers and in other news- 
gathering activities. Brophy (5) found the interview a valuable supplement 
to test results and questionnaire reports in counseling university students. 


Personality Schedules ' 


Personality and character tests were treated in Chapters V and VI of the 


February 1941 issue of the Review. Only a brief note, therefore, will be 
added here. 


ssid 
1 This section was prepared by Ruth Strang. 
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The California Test of Personality (49) covers somewhat the same areas 
as the well-known Bell Adjustment Inventory, but including a lower age 
level. The profile constructed on the basis of the responses is divided into 
two sections: self-adjustment and social adjustment. Somewhat different in 
form and most carefully developed is the Detroit Adjustment Inventory 
(46) designed for junior and senior high-school students. It consists of 
120 items dealing with 24 “problem situations.” For each item there are 
five statements in the first person of which the subject indicates the one 
that describes him best. This inventory has been in use for more than 
three years in the Detroit Psychological Clinic and significant differences 
were found between scores of sixty-one behavior and twenty-seven non- 
behavior cases. The Minnesota Multiphasic Personality Schedule (53) is 
superior to the personality schedules whose scoring keys were constructed 
on a statistical rather than on an experimental basis; it has been standard- 
ized on 1,500 normal individuals and on 220 psychopathic patients. The 
unique features of My Personality Growth Book developed by McCall and 
Herring (54) are emphasis on improvement in personality and use pri- 
marily as a teaching instrument. Five new experimental scales were de- 
veloped by Darley and McNamara (50) on the basis of factor analysis 
applied to test and retest performance on thirteen existing attitude and 
adjustment scales. The numerical high point in factor analysis was reached 
by Brogden (48) who made a factor analysis of the character traits in- 
volved in the scores of forty tests purporting to measure various phases 
of character, intelligence, and personality. No one has yet demonstrated, 
however, the correspondence between the traits which come out of the factor 
analysis and the personality patterns of individuals. 


Critical Appraisal of Personality Measures 


The approach to the study of personality through tests of evaluative 
attitudes has proved its worth, according to the comprehensive review of 
this type of test made by Duffy (52). When certain other types of self- 
estimate questionnaires have been subjected to realistic validation, as by 
Bonney (47), Dudycha (51), Ryans (55), and Wile (56), individual 
errors of judgment have been found to be extensive. The findings of Bonney 
(47) are welcome, if not reassuring. The reports of fifth- and sixth-grade 
pupils on their absences, library book withdrawals, Sunday-school attend- 
ance, and weekly spelling scores for one semester, when compared with 
the actual records, showed an average complete accuracy of only 27 percent 
and an approximate accuracy of 43 percent. Accuracy of estimate seemed 
not to be related to chronological age or I.Q. 

Additional evidence of the inaccuracy of self-appraisal in individual 
cases was presented by Ryans (55) on the bases of his analysis of test 
scores and self-ratings on a five-point scale of items on (a) correct English 
usage, (b) effectiveness of English expression, (c) speed of reading com- 
prehension, (d) understanding of difficult reading materials, (e) extent 
of vocabulary, (f) general cultural knowledge, and (g) knowledge of 
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current happenings. Self-appra‘sals, in the form of group averages, were 
reasonably accurate but individual errors of judgment were extensive in 
many cases. Wile (56) obtained experimental evidence of the lack of valid- 
ity of personality tests by comparing the diagnostic statements derived 
from several methods of personality study with the case records of one hun- 
dred clinic children and with items determined by the chance selection of 
playing cards. The percentage of correct statements derived from the chance 
test was as high or higher than the results obtained through several methods 
alleged to have true diagnostic value. 
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CHAPTER VII 


Test Development: Statistical Aspects’ 


NICHOLAS A. FATTU 


For THE PERIOD REVIEWED the large number of publications, either 
directly or indirectly related to the statistical aspects of test develop- 
ment, made it imperative that severe restrictions be imposed on the scope 
of the following discussion. Correlation, analysis of variance, and other 
techniques were not reviewed unless they happened to relate to tests. 
It may be observed that although the greatest frequency of publications 
was found under factor analysis, the studies apparently showing the 
keenest insight in terms of analysis of basic concepts appeared under 
reliability. 


Bibliographies and General Discussions 


In a restricted manner this review continues the previous one by Flanagan 
(24). Cook (9) and Potthoff (52) gave general discussions of achieve- 
ment testing. Horst (38) considered the logical bases underlying all 
testing and especially personal adjustment. Dunlap (20) presented a classi- 
fied bibliography of statistical developments over a three-year period, 
and concluded with a summary of certain statistical errors made by 
psychologists. Swineford and Holzinger (59) continued their annual 
summaries of statistical developments. Weidemann (62) reviewed the 
conflicting evidence concerning the inconsistency of measurement indicated 
by essay as compared with objective studies. Jackson and Ferguson (41) 
discussed studies of the concept of reliability. Though not restricted to 
test statistics, the Journal of the Royal Statistical Society has published 
a rather comprehensive annual bibliography of statistics since 1936, 
largely under the direction of J. O. Irwin. 


Factor Analysis 


The popularity of factor analysis methods is attested by the fact that 
the greatest frequency of studies was found in this area. A number of 


1 Statistical methods are treated also in Chapters IV and VIII of the present issue; chapters devoted to 
statistical methods have previously appeared in the Review or Epucationat Resgarcn as follows: § John 
C. Flanagan, ‘‘Statistical Methods Related to Test Construction and Evaluation.’ 9: 109-30; February 1941. 
9 Karl J. Holzinger, ‘Factor Analysis.’ 9: 528-31, 619-21; December 1939. § Douglas E. Scates, ‘Index 
Numbers and Related Composites.’’ 9: 532-42, 622-25; December 1939. § Palmer O. Johnson, ‘‘Statistical 
Methods.”’ 9: 543-54, 626-29; December 1939. § Max D. Engelhart, ‘“‘Classroom Experimentation.’ 9: 555-63, 
629-30; December 1939. § Edward E. Cureton and Jack W. Dunlap, ‘‘Developments in Statistical Methods 
Related to Test Construction.’’ 8: 307-17, 357-62; June 1938. { Herbert A. Toops, and G. Frederic Kuder, 
“Test Construction and Statistical Interpretation.’’ 5: 229-41, 309-14; Jume 1935. G. M. Ruch, ‘“‘Recent 
Developments in Statistical Procedures.’’ 3: 33-40, 65-72; February 1933. § Additional brief treatments oc- 
curred in the following chapters: December 1938, Chapters V (Frutchey) and VII (Scates) ; December 1935, 
Chapter III (Lindquist); February 1933, Chapter II (Osburn); and October 1932, Chapter IV (Baker) .— 
Editor. 
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trends seemed to be discernible. There was an increased emphasis upon 
the logical implications of factors and an attempt to reach some kind 
of synthesis among the various methods. Increased ease of computation 
was being sought with some success, and more nearly adequate tests of 
significance seemed to be in the process of incubation. As usual, the 
empirical application of the method formed the modal type of study. No 
new methods of factor analysis were found in the educational field. 

Among the attempts to synthesize the various factor systems and to 
indicate where each might be used appropriately were the publications of 
Burt (2), Holzinger and Harman (35), and Swineford (58). Holzinger 
and Harman (35) considered the various types of factor analysis and 
pointed out the desirability of separating the statistical aspects from the 
theories in a particular field. They gave a discussion of statistical criteria 
leading to a choice of form and method, and prepared a compilation of 
the leading factor systems. Swineford (58) compared multifactor and 
bifactor analysis. Burt (2) emphasized the essential similarity of all 
factor theories and all methods of factor analysis, a contention he sup- 
ported with data on temperament types. Burt urged that factor analysis 
be regarded as a logical rather than a mathematical method. Logically, 
he insisted, factors were principles of classification which specified a 
system of relations. A factor was valuable because it enabled us to hold 
in mind a definite but complex pattern of characteristics. For this point 
an excellent illustration was provided by two studies of verbal ability. 
Carroll (3) confirmed Thurstone’s M, but V was split into three factors, 
and W was split into two. Johnson and Reynolds (43), however, found 
two factors which appeared to be closely related, if not identical with, 
Thurstone’s W and V factors. 

Other developments were Horst’s (37) method for transforming an 
arbitrary factor matrix into simple structure by a method which is almost 
entirely objective; Coombs’s (10) and McNemar’s (47) discussions of 
criteria for determining the number of factors to be extracted; and 
Ferguson’s (21) indication that differences in difficulty between two tests 
or two test items were represented in the factorial configuration as addi- 
tional factors suggesting that if all tests included in a battery were roughly 
homogeneous with reference to difficulty, existing hierarchies would be 
more meaningful. Guilford (30) brought out the same point when he 
indicated that the same kind of item might measure different abilities 
according as it was easy or difficult for the individual. General discussions 
of factor analysis were given by McCloy (45) and Holzinger (36). 

Two discussions of tests of significance were worthy of mention. Mc- 
Nemar (46) reported that three empirical studies on factor loadings agreed 
in showing that sampling behavior of the first centroid factor loading 
was much like that of correlation coefficients, whereas sampling fluctua- 
tions for loadings beyond the first were distressingly large. Young (64) 
applied the method of maximum likelihood to the problem of estimation 
in factor analysis. In a special case he observed how test fallibility en- 
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tered into factor determination, and that the method of communallities 
underestimated the number of factors. It is hoped that Young will con- 
tinue his studies to include the more general case. 


Reliability 


Especially noteworthy among studies of reliability was the increasing 
substitution of an analytic rational, rather than a crude empirical, attack 
on the basic problems and concepts. Outstanding in this respect were 
the studies of Hoyt (39), Jackson and Ferguson (41), and Kelley (44). 

Kelley (44) defended the traditional odd-even reliability coefficient 
and indicated that it was a quantitative statement of an act of judgment 
that the things correlated were similar measures. It was because the 
Kuder-Richardson formulas ignored this act of judgment that they were 
inadequate. Kelley argued, further, that it was less severe to split halves 
than to draw up items in the first instance which measured the same 
function. 

Though he questioned the Kuder-Richardson formulation of reliability, 
Kelley considered the idea important in connection with a definition 
of the “coefficient of coherence”—a measure of the singleness of purpose 
of the items constituting the test. Kuder and Richardson assumed complete 
unity of purpose when they assumed a rank of one for their correlation 
matrix of test items, but Kelley thought it better to measure actual prox- 
imity to a rank of one by computing the “coefficient of coherence.” 

In a penetrating analysis of reliability, Jackson and Ferguson (41) 
showed that reliability may be interpreted with emphasis on errors of 
measurement, on stability of scores, or on sensitivity for assessing in- 
dividual differences. They would abandon the blanket term reliability 
in favor of more specific estimates of absolute and relative accuracy of 
measurement. Estimates of reliability, they suggested, could best be ob- 
tained by the analysis of variance since it separated the influence of 
errors, practice, individual differences, and facilitated the computation 
of Jackson’s measure of sensitivity. 

Several experimental studies were cited to show that retest, comparable 
form, and split-half reliability were not the same. The Kuder-Richardson 
measure, they indicated, was based upon internal consistency whereas the 
usual definition implied agreement between two sets of measurements. 
To find the most reliable combination of a group of tests a method 
of combinatorial analysis was described. Those who work with tests will 
find the “Suggested Test Report” (41:104) of considerable value. 

Similarities and contrasts between the Kelley and the Jackson-Ferguson 
discussions may be observed. Both criticized the Kuder-Richardson formu- 
lation of reliability. Kelley would retain the traditional split-half form 
whereas Jackson and Ferguson would substitute measures of absolute and 
relative accuracy of measurement. Both indicated that the assumption of 
a rank of one for the correlation matrix was untenable. Kelley would 
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evaluate the rank by his “coefficient of coherence.” while Jackson and 
Ferguson showed that the assumption was sufficient but unnecessary. 

The present reviewer agrees with Jackson and Ferguson that there is 
considerable merit in using the analysis of variance for obtaining more 
specific descriptions, but also recalls that it was Jackson (40) himself 
who warned against the adaptation of methods from agriculture to the 
field of education without making modifications to fit the new conditions 
in education. Some of his arguments appear, by analogy, equally cogent 
in the case of linear hypotheses. More explicitly the linear hypothesis 
does not take into account the important aspect of measurement de- 
scribed as validity, except to describe a lack of it as a biased error or 
error effect (p. 19). This assumption seems somewhat indefensible since 
the error effect is described in terms of the measures themselves. We can 
specify a particular confidence range at any level of significance and yet 
not have validity. Validity is more than a biased error. Bias and accuracy 
of measurement logically show some degree of interaction. Knowing the 
amount of bias will be of no use unless the extent of interaction of the 
bias with the error of measurement is also known. It is this lack of con- 
sideration of the aspect of validity that leads one to question that the 
Jackson-Ferguson formulation is the final answer. 

Hoyt’s (39) use of analysis of variance to compute reliability is also 
of interest. In Hoyt’s method the numerator of his ratio for the determina- 
tion of the reliability was “among individuals” minus “remainder” mean 
square. Dividing this by “among individuals” mean square he got reliabil- 
ity. or dividing it by “remainder” mean square he got Jackson’s gamma 
squared. Apparently the difference between reliability (internal con- 
sistency) and gamma squared was the difference between two yardsticks— 
one calibrated in inches, the other in quarter inches. 

Other studies which should be cited are: Casanova (5) presented formu- 
las to show the effect, upon the reliability coefficient. of changes in the 
variables involved in its estimation. Clarke (7) indicated that predictable 
accuracy in examinations was set by the inconsistencies of performance 
of the same individual and proposed to quantify the function by his 
coeflicient of “ubiquity.” Dressel (18) gave another derivation of the 
Kuder-Richardson formula, and Mosier (51) used it to derive a formula to 
simplify its computation. 

Remmers (53), in a series of empirical studies, achieved inconclusive 
results while Ferguson’s (22) discussion may be said to represent a 
rationalization of the problem. Some other empirical studies were reported 
by Carter (4), Cronbach (12), Drake (17), Froehlich (26), and Guilford 
(29). 


Validity 


The fact that test makers and researchers were becoming increasingly 
aware of the problems of meaning and interpretation of results may 
serve to add increasing meaning to their results in the future. Weighting 
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and item analysis may be considered as phases of validity since all 
eventually aim for maximum agreement with the criterion. 

Richardson (55) gave a comprehensive discussion of the weighting 
problem and indicated that the choice of a method of combining variables 
depended upon the properties which one wished the composite to have. 
Naturally there could be no single best method to be used in all cir- 
cumstances. He indicated the properties of several of the commonly 
used methods of weighting. In this connection mention should be made 
of Rulon’s suggestion (56) for using test scores regressed in terms of 
their reliability instead of actual scores. Such regression reduces the 
variability of the test scores and gives a result which is more sensible 
than raw scores, so long as one works with a single test. If, however, a 
composite is to be made of several tests one should not weight these 
regressed scores inversely as their standard deviation (to get z-scores), 
for to do so would be to assign the greatest weight to the least reliable score. 

Within this area, as in others, illustrations abound to show that lack 
of mathematical training among psychologists reflects itself in terms of 
partial solutions to problems where relatively complete solutions are 
possible. An example is the study by Forlano and Pintner (25) who used 
an empirical method to determine the percentage of the total groups which 
must be taken at the extremes of a distribution in order to get maximum 
differentiation. They found that even for moderately skewed distributions 
27 percent seemed satisfactory. An analytical solution of the problem would 
have specified the exact percents for the various levels of skewness. 

Some methods of item analysis which considered only the relation of 
the item to the criterion were proposed by Daniel (14), Garlough (27). 
and Guilford (29). Toops (61) gave further discussion of his L-method 
which considers the relation of the items to one another, while Guttman 
(32) presented an interesting theoretical discussion of the problem. 

Psychophysical methods were applied to item selection by Ferguson 
(23) and Mosier (50) and to scaling by Grossnickle (28). 

Thomson (60) followed individual items of an intelligence test over 
a period of successive retesting, and found that it was not so much the 
type of item as its difficulty which was of predictive significance. 


Scoring 


Hartog (34) argued that he had empirically demonstrated a superior 
educational value and reliability of grading for English compositions 
written with a given audience in mind than under the usual procedure. 
Weidemann (62) continued his studies of the essay examination. 

Deemer (15) suggested criteria for estimating tolerance limits of scor- 
ing errors per paper, when these errors obeyed the Poisson law. Shen (57) 
contended that the more cautious subjects were unfairly penalized for 
their omissions on matching tests and proposed a formula to correct the 
effects of guessing. Colandra (8) in a theoretical discussion used Baye’s 
theorem as the basis of a general equation for scoring objective tests. 
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The limitations of Baye’s theorem and the increased scoring difliculty 
make the application of little practical importance. 
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CHAPTER VIII 


Tabulating and Test-Scoring Machines: 


Applications of International Business Machines to 
Educational Research 


IRVING LORGE 


Any REVIEW of the literature on applications of machine methods to 
tabulation, computation, and test scoring at the present time will be in- 
complete and inadequate. Many applications of machine methods have 
not yet been given formal publication. The present review will be limited 
to the machines developed by the International Business Machines Cor- 
poration and to applications of those machines to educational research. 
No single book covers the variety of machines available or the variety 
of applications possible. It is hoped that the long promised (since 1935) 
publication on “Questionnaires, Standard Codes, and Hollerith Machines” 
by H. A. Toops will be published soon. 

The IBM machines include a large variety of punches, sorters, account- 
ing machines, auxiliary machines, and special devices. The less well-known 
machines are the alphabetical interpreter, which prints punched informa- 
tion across a card; the collator which compares two sets of punched cards 
in order to match them or merge them; the duplicating summary punch; 
the gang summary punch; the automatic reproducing punch; and the auto- 
matic multiplying punch. Some special devices are the cross-footing device 
which makes possible computations such as E + [(A & B) + C+D}: 
the card cycle total transfer device which enables speedier summation of 
cumulative sums; the card matching device for selection of specified pro- 
files; the digit selector particularly useful for item analysis; the mark- 
sensing device which punches data into punch cards from pencil marks; 
and the test scoring device for grading questionnaires like the Strong 
Vocational Interest blanks. In addition, the IBM machines include the 
test scoring machine with its graphic item counter and the aggregate 
weighting device. 

The machines, in general, make for greater speed in recording, classi- 
fying, and tabulating data, and in computing statistics. Because of great 
speed, the availability of data makes for greater statistical control of multi- 
variate background data and extends the possibilities of research reporting. 


General Books and Articles 


The basic reference in the applications of punched card machines is 
Baehne (2), which includes chapters on the development and principles 
of Hollerith machines (Arkin), on applications to work of registrars’ 
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offices, and to the work of university business offices, a cogent article on 
questionnaire construction and analysis (Toops), scoring of the Strong 
(Strong) and of Free Association tests (Kelley), the use of the multiply- 
ing punch (Carver), and the application of progressive digiting to corre- 
lation, variance and covariance, least squares, and differencing (Brandt). 
The Baehne reference is an indispensable background for applications and 
procedures of the punched card machine. 

As an adjunct to Baehne, the various manuals of the International Busi- 
ness Machines Corporation should be utilized (35, 36, 37, 39, 40). Hart- 
kemeier (32) applied the machine to accounting but also illustrated the 
method of digiting in obtaining correlation coefficients. Eckert :(22) indi- 
cated the flexibility of the IBM equipment, particularly emphasizing the 
construction of tables, interpolation, and harmonic analysis. Jolliffe (42) 
briefly described the machines and their functions in educational research 
and statistical analyses. McPherson (53) referred to various mathematical 
operations with punched cards in table making, regressions equations, 
harmonic analysis, evaluation of determinants, and transformation of data, 
e.g., scaled scores for raw scores. Meacham (52) developed one of the 
few teaching texts for use of the machines, particularly in relation to 
vital statistics. 

Carver (7) and Snedecor (72) described the uses of the machines in 
mathematical computations. Several books give applications to accounting 
(32, 69) and particularly to school accounting (43). Toops developed 
an annotated bibliography on tabulating and recording devices, including 


equipment other than the IBM (82). 


Statistical Applications 


Since the basic operation of the tabulator is the method of arriving at 
sums and cumulative sums, an understanding of the summation method of 
arriving at the values of {X, EX, and so forth, is indispensable. Perhaps 
the earliest application of the summation method with digiting was made 
by J. C. Dunlap (12). The earliest references to the use of punched card 
equipment for correlation and multiple correlation was made by Smith 
(71). 

The digiting method was popularized by several statisticians (5, 76, 
87, 88). The summation method was rediscovered (62) and greatly ex- 
tended by Dwyer (21), who summarized the theoretical background for 
the computation of moments with cumulative totals and cumulative multi- 
pliers. Mendenhall and Warren (58, 88) utilized the principle of cumula- 
tive sums and digiting for getting correlation coefficients, giving credit 
to Leavens for the digiting process (49). Dwyer (18) illustrated the 
power of the machines to compute the various statistics that are needed 
from data. 

Dwyer and Meacham (20) demonstrated how to prepare correlation 
tables on the tabulator with digiting by digit selection; Milliman (59) in- 
dicated how digiting may be accomplished without sorting; J. W. Dunlap 
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(13) showed how the machine can compute means, standard deviations, 
and correlations with positive and negative numbers; DuBois (10) de- 
scribed how various statistical processes can be done on the card counting 
sorter; Kuder (46) revealed how correlations can be accomplished on 
the test scoring machine. Dwyer (19) and Meacham (57) employed the 
collator for pulling prepunched cards with data in form X, X*, XY, X’Y, 
and so forth, to compute moments, product moments, and any tabled 
function. 

Dwyer (17) pointed out that the Hollerith equipment is economical for 
five or more variables with cases in excess of 250. The Friden, Marchant, 
and fully automatic Monroe are more economical for four or less variables 
with cases up to 250. 

McPherson (54) gave the background for the mechanical tabulation of 
polynomials for which the machines had been adapted (89). Sandomire 
(68) accumulated cubes directly from punched cards and Feinstein and 
Schwarzschild (26) used the machines for automatic integration of dif- 
ferential equations. Hartkemeier (33) showed how differences may be 
obtained from punched cards. 

In the computation of item validity data, Tucker indicated how to quan- 
tify attributes (84a), Royer (66) demonstrated the steps for obtaining 
biserial correlations, which were somewhat extended by DuBois (11). Stal- 
naker (73) adapted the tabulating machine for computing difficulty and 
validity indexes; Flanagan (29) used Kelley’s upper and lower 27 percent 
for getting estimates of biserial correlations and difficulty; Lindquist (50) 
obtained percents through cumulation of reciprocals for each item choice 
on a test, a technique also used by J. W. Dunlap (14) for getting basic 
data for estimating the value of tetrachoric correlation coefficients. Adkins 
(1) suggested applying the machine for Toope’s L-Method of selecting 
items, and Flanagan (28) for approximating regression equations. The 
earliest use of the machines for regression equations was perhaps that of 
Segel (70). 

Flanagan (27, 30) adapted Rulon’s procedure for estimating reliability 
coefficients, and Stalnaker (74) noted the computation of Y values for 
integral values of X. 

In factor analysis, Tucker worked out an approximate matrix multiplier 
(84), a method of getting Thurstone’s centroid technique from punched 
cards (85) and graphs for the factor patterns (86). 


Coding and Card Forms 


The value of the punched card methods depends upon the classification 
and coding of the classifications adopted. Research workers will recognize 
that the coding-classification step is one of the more important elements in 
analysis. One of the most significant contributions to coding was made 
by Dunn (16) in his utilization of the geometric code for extending the 
capacity of a punched card. Toops and Royer extended coding concepts 








~T 





December 1942 TABULATING AND TEST-SCORING MACHINES 





ereatly, particularly with reference to the uniqueness of the geometric code 
(05, 67, 78, 80, 81). Edwards (23) has shown how coding is a necessary 
antecedent to analysis of medical research data. 

Berkson (4) developed an interesting punched card form for written 
and punched data. The punch operator punches directly from the written 
data which are always in view on the card being punched. Crissy and 
Flanagan (9) adapted the punched card for recording status with reference 
to deciles on various tests so that a profile can be developed. 


Statistical Controls 


Ever since Toops (77) indicated the need for statistical checks on data, 
the machine users have attempted to produce automatic or semiautomatic 
checks on calculations. One of the most significant contributions is that of 
Langmuir (48) on controlling errors in tabulation, card counts, and cal- 
culation. Brandt (6) indicated the effect of coding on calculations and 
their corrections. 


Scoring Methods on Punch Card Machines 


In psychology and education, scoring (particularly of multiweighted 
items) has been a laborious and time consuming process. Wood's method 
(91) for scoring the Strong blank has made work with it and tests like it 
more practical. Bedell (3) achieved somewhat the same result on the card 
counting sorter, and Rock (63) on the tabulator, particularly for develop- 
ing item weights for such tests. Toops and his students (41, 79) developed 
the mask card in scoring multiple choice tests and in securing item analyses. 
Ross (04) also used the Hollerith for scoring tests. 


Test Scoring Machine 


The scoring machine implies an adaptation of the test item and the re- 
sponse sheet to the scoring machine (24, 44, 45, 47, 90). The items have 
to have a specific form and tend to be mostly of the recognition type. Very 
few investigators, however, have questioned the use of separate answer 
sheets as compared with direct response in the test booklet. McCullough 
and Flanagan (51) found the machine scoring form as valid as the booklet 
type. Traxler and Hilkert (83) noted that students who had plenty of room 
to take the test booklet with its answer sheet on a table did better than those 
who had to manipulate the booklet and answer sheet on arm chairs. This 
finding was statistically significant in only one of seven comparisons. but 
if proved generally it will imply separate norms for different conditions of 
administration. Dunlap (15) in an elaborate research has shown that the 
separate answer sheet does not affect reliability or validity of results. 


Statewide and Other Applications 


The possibility for accelerating scoring, reporting, and computing with 
the entire IBM equipment has been realized in many statewide testing 
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programs (38). Some of the many applications and adaptations have not 
been published. Illustrations of the flexibility of the equipment are given 
by Feder (25), McQuitty (55, 56), Mosier (60), and Stromberg (75). 
Cox utilized the equipment for research and student guidance, as do most 
of the testing installations. 

Pinkus (35, 61) adapted the machine for high-school programming in 
a large urban high school, and Hall and Henderson (31) in evaluating 
the success of teams in judging cattle and crops. 

With the impetus given to speedy test scoring, test reporting, and test 
analysis by the classification workers in the Army, Navy, and Marine 
Corps, new adaptations of the machines have been, and will continue 
to be, made. It is hoped that these applications and procedures will be 
published for the benefit of research workers in all fields. 
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CHAPTER IX 


Reporting, Summarizing, and Implementing 
Educational Research 
DOUGLAS E. SCATES 
How Should Research Findings Be Publicized? 


How can the educational press help in making research effective in class- 
room practice?” This was the subject discussed by the Educational Press 
Association of America in its meeting in Atlantic City on February 25. 
1941. One speaker, thinking primarily of survey research, fact-finding, 
and general evaluation, noted the great need for large scale cooperation 
on the part of field workers and the importance of group action in follow- 
ing up the survey to interpret findings and see that appropriate steps 
were taken to set the forces of improvement in motion. He urged a greater 
amount of publicity for such studies before, during, and after their prose- 
cution. Another speaker, thinking primarily of “scientific” research—de- 
tailed experiments, laboratory work, the analysis of causal factors and 
mechanisms—noted the slow progress of scientific knowledge, the explora- 
tory character of many investigations, the conflicting results of experiments, 
and the need for true scientific caution in accepting findings until they had 
been verified by many studies attacking the problem from a variety of 
approaches. He urged that due time be allowed between the reporting of 
a pioneer study and the adoption of practices in the classroom which 
devolved upon the new findings. Science has often been wrong. Much harm, 
both to children in the schools and to the esteem with which educational 
research is held, will inevitably follow upon the hasty acceptance of un- 
tested research findings. 

There is apparently more than one type of educational research and 
more than one appropriate type of reporting. Certain undertakings un- 
doubtedly need a broad base of social support; such are the studies which 
deal with problems that social action can handle. Certain other investiga- 
tions need careful scientific scrutiny and testing over a period of years 
before they are ready to be thrust upon large groups with the admonition 
to accept them, for they are “scientifically proved.” Such studies need 
to be reported in technical journals which will excite the interest of other 
workers competent to study them further. In due time they will be ready 
for inclusion in yearbooks, textbooks, and teachers’ magazines. If science 
is not to mislead, it must be science—and not the results of some one or 
several studies made by workers having the same background of ideas. 
No finding is safe until it has been critically examined and tested by 
workers with diverse points of view. Our efforts to have research reported 
more widely and immediately should therefore be discriminating. 
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Need for Synthesis, Interpretation, and Evaluation of Research 


It may be that there are certain deficiencies in the existing provisions 
for research investigations and reporting which retard progress both in 
practice and in theory. Reports of research findings need to go through 
a number of steps before they are ready for ultimate use. Our division of 
work in the United States may possibly be weak in some one or another 
of these points along the line. Possibly more facilities for reviewing, inter- 
preting, synthesizing, and evaluating are needed. There is always danger 
that with various schools of thought and individual research centers pur- 
suing their own special problems, no one will do the necessary work of 
synthesizing the various findings and weaving them into a “nonpartisan” 
whole which will have meaning and value for the consumer of research. 
In building our knowledge, the facts that add perspective are as important 
as the facts that add detail. 

Such an integrating, evaluating task is entirely different from the direct 
dissemination of results of individual studies. We undoubtedly need, in 
education and psychology, more “research philosophers” who are not pri- 
marily engaged in producing research themselves but who will assemble, 
relate, and find both theoretical and practical meanings in the great mass 
of research which is completed every year. The conclusions of such inter- 
preters would differ, but there is no reason for assuming that their work, 
based on existing research, would be any more in error than are many of 
the individual research studies. Such a process was one of the goals of 
Walter S. Monroe in preparing his Encyclopedia of Educational Research 
(52), when he urged each writer to tell, “What does the research to date 
add up to?” This function is discharged to a certain extent by the REVIEW 
oF EpUCATIONAL RESEARCH, though most contributors lean in the direction 
of being comprehensive and noncommittal rather than in the direction of 
interpreting, evaluating, and synthesizing. 


Need for Direct Study of Applications 


We also need persons who view research findings from a background 
that is charged somewhat more with the needs of the practitioner than 
with the abstractly technical and scientific considerations of the research 
producers. Such work can often be effectively accomplished by committees, 
if the persons with different backgrounds are willing to make some con- 
cessions to the criteria of those who have opposing frames of reference. 
At any rate, all must recognize that when research is to be applied, some 
of its “purity” must be given up—at least for the time. The attempt to 
apply will, however. lead to new problems demanding further study. Any 
application of science calls for a continuing adjustment between various 
factors, as one after another is modified or as general goals and purposes 
change. 

We may, for example, note this sequence of study—application—study, 
in the case of radio. Although radio is based on the fundamental principles 
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of electromagnetic wave propagation set forth in mathematical terms by 
James Clerk Maxwell some seventy years ago, radio did not immediately 
spring into existence. Research on radio tubes is still under way, and it 
appears from the current full-page advertisements of the General Electric 
Company concerning its developments in electronics that a new area of 
possibilities has just been uncovered. And of course radio research must 
deal not only with tubes but with all aspects of radio—combinations of 
tubes, improvement of parts of the circuit, groupings of parts into new 
circuits, sound, adaptations to various purposes, market research, and so 
forth. It is probable that research on radio, both on the physical side and 
on the market side, will continue indefinitely. Even though radio as we 
think of it lies in the field of technology, the great number of problems 
which arise in the applications of the various scientific principles reveal 
countless lines for “pure” research to follow up—lines that would other- 
wise never be dreamed of. We would have no science of electronics if we 
had not first had the radio tube. Pure science, when it is alert, profits from 
applications as much as does the consumer. The same condition holds for 
research in education. The problems of application are as great as the 
problems of original discovery—and are in the long run as stimulating 
to “pure” research, 

Occasionally just the change from small-scale to large-scale procedures 

will involve new difficulties. This condition is well illustrated in medical 
chemistry by the early work with insulin. After the hormone had been 
discovered, had been experimented with both in the laboratory and in 
the clinic with highly satisfactory results, its manufacture on a large scale 
was undertaken. The results “were at first extremely unsatisfactory and 
disappointing. Indeed, for some months (in 1922), although every stage 
of the laboratory process was apparently being duplicated step by step 
on the larger scale, it seemed impossible to obtain anything like the ex- 
pected yield of insulin. It almost looked as if we had lost the secret of pre- 
paring insulin. Some unknown factor had evidently crept in, leading to 
the destruction of the hormone during its large scale extraction. . . . It 
was several months before the difficulties were gradually overcome. . 
It has since transpired that a chief cause for the difficulty depended on 
inadequate control of the degree of acidity at certain stages in the extractive 
process.”' The transfer of science from the laboratory to large-scale use 
may be expected always to have attendant problems requiring further study 
and research. 


Examples of Research Summarization, Synthesis, 
and Interpretation 


The joint yearbook of the Research Association and the Department 
of Classroom Teachers (3) is a recent illustration of the summarization 


1J. J. R. Macleod. “Insulin to the Rescue of the Diabetic.’’ In Chemistry in Medicine, Julius Stieglitz, ed. 
New York: The Chemical Foundation, Inc., 1928. p. 304-305. 
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and interpretation of research findings for direct classroom use. It is 
the current counterpart of the Eighteenth Yearbook of the National So- 
ciety for the Study of Education, Part II, presenting the Fourth Report 
of the Committee on Economy of Time in Education, issued in 1919. 
Such reports are pointed toward the practitioner. 

The Encyclopedia of Educational Research (52) was an effort to pre- 
sent “comprehensive and critical syntheses of the results of educational 
research organized in a conveniently usable form. . . . It is the purpose 
to tell what the findings of research ‘add up to’ after critical evaluation 
and what this synthesis of findings means relative to educational theory 
and practice” (p. vii). The Encyclopedia stands about midway between 
the original research reports and the interests of the ultimate consumer 
of research. It is of service to both groups of users. 

In the general category of interpretative summaries of research should 
be mentioned the five-volume report now being published by the Pro- 
gressive Education Association (64) covering the Eight-Year Study of 
high-school and college articulation, and the final volume of the Amer- 
ican Youth Commission (6) summing up their several years of study. 
Other aspects of interpretation, involved directly in implementation, will 
be discussed toward the close of the chapter. 

The 1938 Contributor's Manual for the Review or Epucationar Re- 
SEARCH states that “The Review is to be factual, evaluative. and sug- 
gestive.” A chapter should include “A summary description of the methods, 

setting or background, placing the research properly in its field. . . 
critical evaluation indicating weaknesses and strengths, .. a general 
overview of the field indicating the contribution of the research to prac- 
tical problems and to a developing generalization.” The 1942 edition 
states: “The purpose of the Review or EpucaTionaL RESEARCH is to 
review—summarize, synthesize, interpret, and evaluate—current research 
in education.” As indicated previously, many contributors to the Review 
lean more toward a catalog of current research than a synthesis. The 
significance of research for practice is seldom pointed out: probably the 
REVIEW is too close to the original reports for this to be an appropriate 
step. The Review is of primary service to students or others about to 
undertake a piece of research in a field that is somewhat new to them, 
and to those who wish to keep abreast of research activities in a variety 
of fields in education. 

It will be noted from looking over interpretative syntheses that they 
almost necessarily involve going beyond the data and drawing more 
general or more particular conclusions than the facts will entirely support. 
The more the summaries attempt to interpret research for practice, and 
the further away they get from the original reports, the more this is true. 
Such a step is not to be shunned. It is one phase of “applied science” — 
technology. Science itself deals usually with single, narrow aspects of 
phenomena: the practitioner needs to consider all the aspects of the 
situation in which he is working. Such practical interpretations of research 
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are, therefore, to be regarded as approximations and, perhaps, as starting 
points for further study, when the results in the field do not approximate 
those obtained in the more theoretical work. By way of analogy, it is easy 
for physicians to destroy viruses in the laboratory, but they have prac- 
tically given up the attempt to stop or control a cold in one’s head 
through the use of antiseptics. “In this case general therapy is more 
effective than local therapy,” one is likely to be told. All syntheses of 
theoretical findings must be regarded as highly tentative for practice: 
the application usually involves difficulties which require that the theo- 
retical conclusions be restudied in the light of practical factors. 


Overviews and Compilations; Bibliographical Summarization 


There are a number of summaries which present overviews of certain 
research areas or periods, usually with some interpretation but without 
a definite attempt to synthesize into a general theory or point of view the 
findings of the research reviewed. Outstanding among these are the annual 
reviews of large research undertakings and implementing projects by 
Good (32) which have been published for the past five years. Good also 
prepared a somewhat longer résumé and orienting discussion (33). Scates 
(76) prepared a brief overview of research for the past decade. Carr has 
prepared the seventh listing of Deliberative Committee Reports (20): 
references to the earlier annual compilations are included. These lists are 
annotated and help give an annual picture of the findings and decisions 
of various committees that are working with educational problems. 

If we include those publications which merely select, classify, and 
make more available the original articles, we should mention such peri- 
odicals as Education Abstracts, Education Digest, and the Loyola Educa- 
tional Digest. There is also the work of the National Education Associa- 
tion Research Service in publishing Education in Lay Magazines. This 
periodical was recently analyzed for a ten-year period by Hughes 
(43). A section of the Journal of Educational Research, “Research News 
and Communications,” edited by C. V. Good, is given over each month 
to the publishing of news notes about current research activities. Purely 
bibliographical services were treated in Chapter I of the present REVIEW. 


Procedures and Criteria for Preparing Technical Reports 


This section deals with the more immediate or detailed aspects of re- 
ports. Treatises on the preparation of research reports before 1936 were 
covered by Good, Barr, and Scates (35: Chapter 13) with 126 references. 
and by Culver (26) with 62 references. Whitney’s 1937 text or his revised 
edition (90: Chapter 16) includes about 50 references. Nine more or less 
brief treatments on the preparation of theses or undergraduate research 
papers are: (19, 23a, 24, 34, 36, 44, 61, 86, 94). Several papers on desir- 
able forms in which to report statistical facts to businessmen, news re- 
porters, and other consumers (11, 29, 45, 96) grew out of a program 
of the American Statistical Association on this subject. One may wish to 
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consult publications on the preparation of research reports in engineering 
(8, 42) and in chemistry (56, 69). 

Various forms of citation for use in educational work are included in the 
educational treatises of the preceding paragraph. Forms of citation for psy- 
chological publications were discussed in three places (30, 46a, 83). A re- 
vised edition of the Government Style Manual (87) has appeared. The 
University of Chicago Manual of Style (23) has gone thru its tenth edition. 
Manuals for contributors to the Review OF EDUCATIONAL RESEARCH were 
issued in mimeographed form in 1938 and in 1942. Other publishers have 
prepared form books and manuals for their own authors. For current list- 
ings see the head, “Stylebooks (printing)” in the Education Index. 

Segel, in Chapter III of this issue, notes the need for specifying condi- 
tions and units of enumeration; and Blommers and Lindquist, in the closing 
section of Chapter IV, mention the need for clearer and more adequate de- 
scriptions of the technical aspects of what was done, in reporting any 
investigation. 

Popularized reports and graphing are discussed in later sections. For cur- 
rent references on reporting, one may see the head “Reports: Preparation” 
in the card catalog of a library; Good’s “Selective Bibliography on the 
Methodology of Educational, Psychological, and Social Research,” Section 
I1l—usually in the September issue of the Journal of Educational Re- 
search; and “Research, Educational: Techniques” in the U. S. Office of 
Education annual Bibliography of Research Studies in Education. 


Evaluation of Reporting Mediums; Problems of Publishing 


Various statistical attempts have been made to estimate the service 
which a particular magazine is rendering. Circulation is one basis for 
estimating service and appeal. The number of persons who read each 
magazine is a second index (35:126-30). The number of times articles in 
a magazine are cited in articles appearing in other professional maga- 
zines is a third form of evidence concerning worth. Direct judgments 
or evaluations constitute a fourth method. An analysis of the third type was 
made of several psychological periodicals some time ago by Cason and 
Lubotsky (21) and has been done more recently for educational peri- 
odicals by Wilkins and Anderson (91, 92, 93). As is the case in all 
evaluations based on frequency studies, considerable care must be ex- 
ercised in drawing conclusions. Similarity or commonness of interest 
and purpose between the periodicals cited and those doing the citing 
and between the citing periodicals and the reader must be assumed before 
such frequency counts have significance either in general or for the 
prospective reader or subscriber. Steele (81) prepared a rating scale for 
book reviews and applied it to certain journals. 

Publication lag was discussed in two places (66, 67) in psychological 
magazines and was discussed in educational magazines by Donohue (28). 
Microphotography or microprint as a method of publishing research was 
treated in Chapter I. 
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Dull vs. Appealing Reports of Activities 


In recent years there has been something of a revolution in the public 
reports of the activities and statuses of school system and corporations. 
If reports are worth making, they are worth reading; if they are to he 
read, they must have widespread appeal. So superintendents, business 
managers, municipal governors, and industrial corporations have con- 
cluded that if their reports to the public or to stockholders are dull or 
not instantly comprehensible, it is the fault of the persons who get the 
report out, for reports can be glamorous! Accordingly, many reports have 
been made much more appealing, usually thru the use of more photo- 
graphic or other graphic material, and through the decrease of tables 
and routine records of departmental work. The older reports were 
commonly couched in technical or semitechnical terms and involved 
matters of far more concern to the persons in charge than to the common 
reader. 

While we are not here intimately concerned with superintendents’ re- 
ports, inasmuch as they cover activities and seldom deal with research, 
we may nevertheless have occasion to consider some of the methods they 
have employed. There are times when research could well afford to stimu- 
late the fancy and interest of lay people in its endeavors, its problems. 
and the ingenuity and patience of its workers, even though it makes no 
effort to acquaint people with its detailed findings. Numerous current 
popular reports of medicine and physical science do a great deal to keep 
these fields in high esteem in the public mind. Educational research. 
like school administration, is in need of taking time occasionally to 
glorify its work and share with others the hopes that impel its workers 
forward in their long, monotonous tasks and the difficulties and dis- 
couragements that often beset the explorer seeking something which, even 
after years of effort, he is not sure is there. 

The changed character and purpose of many superintendents’ and 
business managers’ reports were reviewed at some length and with con- 
siderable clarity by Theisen (84). Arnold and Castetter (7) mentioned 
three more articles. Educational Research Service prepared a_ bibliog- 
raphy (2) on superintendents’ reports. We shall make reference here 
only to six articles commenting on the character and preparation of 
such reports (10, 16, 27, 46, 62, 72). 

It seems appropriate to mention as examples four recent school reports 
which are conspicuous among those which are highly pictorial: (a) 
The Springfield, Missouri, 1938 report (80) carried about 60 percent 
pictures and no tables, and was published “to give you as taxpayers 
some notion of our conception of the meaning of democracy and its im- 
portance, and some idea as to the school’s responsibility for and con- 
tribution to preserving and making democracy work better.” (b) The 
Rochester, New York, 1943 budget (71) carries six pages of pictures. 
captions, and paragraphs at the outset. and the budget is presented in 
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the remaining twelve pages with a moderate amount of text and about 
20 percent pictures. Graphs are pictorialized in various ways. The Rochester 
budget for some years has been an inspiring publication. Its present 
character took form in 1936; its earlier popular form dates back to 1927. 
(c) The Fostoria, Ohio, report for 1941 (31) carried about 90 percent 
pictures. Three pages were “thermometers,” showing the ratings of the 
schools on eighteen aspects according to the Cooperative Study of Sec- 
ondary School Standards. It is interesting as a report of a relatively 
small city. (d) The Chicago report for 1941 (22) was profusely illustrated 
(about 30 percent of space) thru some 500 pages, followed by about 
fifty pages of tables. (The report covers the years 1936-41.) It was or- 
ganized around activities in the school curriculum which were of im- 
mediate interest to parents and the general public. (e) In addition to 
these reports for individual cities we should mention a popular report 
of the U. S. Office of Education (88) which utilized pictorial and dia- 
grammatic material to portray its manifold activities. There was no 
separate text. 

As treatises on the preparation of reports in business and city govern- 
ment, five references will be cited (8, 70, 75, 77, 78). Other references 
may be found in the card catalog of libraries under “Reports: Prepara- 
tion” or “School Reports”; in the Education Index under “Reports and 
Records”; and in the U. S. Office of Education annual Bibliography of 
Research Studies in Education under “School Management: Reports and 


Records.” 


Visualized Presentations of Research Findings 


While the reports of activity and status referred to in the preceding sec- 
tion relied largely upon photographs, because of their appeal and por- 
trayal value, reports presenting research results, survey findings, and sum- 
marized data have made more use of one form or another of graphs. Thus 
the Regents’ Inquiry of New York State summarized the results of “fifteen 
printed volumes and many typewritten and mimeographed reports” in a 
46-page Primer (39), consisting of one or two sentences of text on each 
page with a pictograph occupying the rest of the space. In the same year 
the New York State Education Department published a bulletin (60), 
consisting of thirty-six full-page pictographs with a short paragraph of text 
accompanying each, to present a brief history of the schools of the state 
and their problems. The American Youth Commission’s Youth Tell Their 
Story (9) contained sixty-two charts and pictographs—one diagram for 
every 314 pages of text—in addition to ninety-nine tables. Shuttleworth’s 
presentation of factual material on growth during adolescence (79) de- 
pended wholly on graphs and pictures, with accompanying paragraphs of 
explanation; there was no separate text. Goodykoontz departed from the 
traditional style of the U. S. Office of Education summary bulletins (37) 


by omitting ordinary tables and diagrams and introducing a number of 
pictorialized graphs. 
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Other Visualized Factual Material for the Nontechnical 
Consumer 


We shall refer here briefly to several other publications which are good 
examples of the technique of presentation being discussed. The War 
Department innovation in a textbook (89) which is devoid of text apart 
from brief comments on the graphs is interesting both because of its 
subjectmatter and its form. A graphic history of the United States (40) 
has about one-third of its space given over to an ingenious variety of 
pictographs. The magazine Building America, published by the Americana 
Corporation for the Society for Curriculum Study and now in its eighth 
volume, gives roughly 40 percent of its space to text and 60 percent to 
pictures and graphs. The set of pamphlets published each year by the Trav- 
elers Insurance Company (Hartford, Connecticut) on automobile accidents, 
for free distribution to schools, and the annual publication Accident Facts 
(55) are excellent examples of how simple, “dry” statistics can be made 
most attractive. Hoban, Hoban, and Zisman (41: 198-207) not only dis- 
cussed visualized textbooks but prepared a treatise on visual materials for 
classroom use which may itself be cited as an example of visualized text. 
The Public Affairs Pamphlets (distributed to schools by Silver Burdett) 
and the Headline Books of the Foreign Policy Association are active series 
which make extensive use of pictographs. 

Neurath’s semipopular treatise on modern man (59) included many pic- 
torial diagrams which he states “do not merely act as illustrations or as 
eye-bait; they are parts of the explanations themselves” (p. 7). As other 
examples of books in which the visual material is integrated with the 
text and forms a part of the continuity, instead of serving merely as 
exhibits, we may mention two books by Caldwell and Bourke-White (17, 
18) which represent photographic and verbal reporting of American life. 
A publisher’s blurb states: “Two creative artists have developed and mas- 
tered an original and brilliant pattern of book-making. The face of a 
nation has never before been recorded so richly as here.” 


An Outstanding Graphic Development: Pictographs 


One cannot read the references in the two foregoing sections without 
realizing that a new type of graph has come to occupy a leading position 
among graphs designed for popular consumption. This newcomer is the 
pictograph. It not only does an admirable job of presenting comparative 
quantities but, in the capable hands of its leading exponents, it has been 
adapted to clarifying and presenting complex relationships, including cause 
and effect and story sequences. In spite of doubts and inhospitality which 
have been prevalent among technical workers during the past two decades, 
this type of graph has swept into indisputable dominance of the graphic 
field wherever relatively popular reports of a few quantities are involved. 

The pictograph is essentially a series of suggestive symbols repeated to 
form pictorial bars. There are other developments of pictograph, in the 
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realm of maps and sequences, which involve primarily simplification and 
suggestion. The pictograph is not appropriate where curves are called for; 
but frequently the popular graph can, and should, be simplified to the point 
where a few values will suffice in place of the larger number of observa- 
tions giving rise to a smooth curve. At a meeting of the American Statis- 
tical Association in Chicago (December 28, 1940) devoted to “Principles 
and Procedures for Putting Across Business-Statistics Reports to Execu- 
tives,” several speakers and other commentators expressed dislike and dis- 
dain for the pictograph. It was regarded as a highly complicated and frilled 
device which commonly ends in fractions of a symbol and generally dis- 
tracts one’s attention from the main point. Brinton (12: 12-13). in his 
preface, implies that it was developed in countries where the people had 
a low level of literacy and is something of a “weed” in America. It seems 
probable that the professional statistician, thinking only of a technical or 
intrinsically interested reader, will view such graphs differently from what 
the general reader will. Pictographs, where they are well designed. catch 
one’s attention, getting many a reader to pause long enough to look at 
the graph. They hold the attention on each bar, or category, longer than 
a simple printed label would, thus giving time for more comprehension. 
They add an emotional element of interest, enjoyment, and satisfaction 
to the bare facts which are presented. And, again when they are well done, 
they represent a careful analysis of the problem and a simplification to 
the most important aspects so that the main points stand out clearly—which 
is just what a graph should do. If they go further in this direction than 
the highly trained statistician would go, perhaps the statistician needs more 
training along the line of how the common person thinks. 

The pictograph was developed in a social science museum in Vienna 
during the early “twenties, principally under the sponsorship of Otto 
Neurath, with Rudolf Modley as a member of the staff. While earlier 
isolated instances of this type of graph were produced or quoted in several 
textbooks (1; 12: 122-24; 73; 74), in the United States nothing was done 
to exploit its possibilities. Modley came to America in 1939, joining the 
staff of the Chicago Museum of Science and Industry. In 1934 he helped 
found, and became executive director of Pictorial Statistics. This agency, 
which renders a commercial service to schools, authors, and publishers, 
has lately used the name of Pictograph Corporation because its work has 
extended to include nonstatistical graphing. Neurath left Vienna in 1934 
and helped establish the International Foundation for Visual Education 
at The Hague, Holland. He is now at Oxford, England. 

The movement which developed the pictograph has involved far more 
than just drawing graphs; it has embodied the searching for social facts 
and relations which are of importance to present. It has lead to social 
museums, foundations, and a great increase of interest in social statistics. 


The history of the movement to 1937 was presented by Modley (47: 
Chapter 13). 
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Examples of pictographs are far too numerous to mention here. Many 
of the reports cited in previous sections employ them (9, 39, 40, 59, 60, 
89, and others). Modley (47: 158-66) gives an extensive bibliography 
for publications in Europe and in the United States to 1937. 


Treatises on the Production and Use of 
Pictographs and Other Graphs 


The principal discussion of pictographs and methods of making them 
is by Modley (47) in 1937, a work which is now being revised. His best 
brief discussion of types of pictograph to portray various facts (51) is 
probably not widely available. Other discussions, however, serve to round 
out the picture (41: 242-47, 48, 49, 50, 95). Three of these references 
(41, 48, 50) deal particularly with the use of pictographs and other 
charts for teaching purposes in the classroom. A catalog of 1000 Pictorial 
Symbols (63) has been published illustrating pictograph symbols which 
are available for use by others (cost, 5 cents each, to schools). Two small 
books by Neurath (57, 58) discuss the use of pictographs for an inter- 
national symbolic language. 

A point that is likely to be overlooked by the novice and omitted in the 
treatises is that the success of a pictograph depends upon something more 
than clever symbols. The production of a good graph involves, first, an 
analysis of the principal facts or elements of the story to be represented; 
second, the selection of the appropriate type of graph; third, judgment as 
to proportion, titles, legends, and so forth, so that the final product is a 
graph that is simple, clear, bold, engaging, and accurate in the impression 
it conveys. A successful graph depends far more on careful thought and 
judgment than on technique. 

This section is not concerned with graphing in general but only with 
graphs which are particularly important in the reporting of research and 
summarized facts. In this connection we note two publications of basic 
importance (4, 5) by committees of the American Society of Mechanical 
Engineers. These reports (one of which is in press at the present writing) 
represent a long-delayed answer to hopes which were engendered some 
twenty-eight years ago when a committee was set up to produce standards 
for graphic presentation. The preliminary report, published in August 
1915, was reproduced in many textbooks, but was generally regarded as 
something of a makeshift. It is therefore a great satisfaction to have ex- 
tensive, detailed, and authoritative reports covering two large groups of 
graphs—time-series charts, and engineering and scientific graphs. Brinton’s 
pioneer treatise on graphic methods (in 1914) has been superseded by 
a revised edition (12) which, though it lacks the adequate textual setting 
of the earlier volume, presents a great variety of graphic possibilities. 
One further reference (68) may be of interest since it reveals statistical 
graphs from the point of view of a cartographer. Other treatises will be 
found by consulting the head “Graphic Methods” in the card catalog of 
a library or in the Education Index. 
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Implementing Programs and Publications 


A brief note on implementation will close this chapter on reporting and 
also finish this research volume which was appropriately begun with a 
discussion of library procedures. Getting research into practice completes 
the research cycle—and also begins a new one, for further research will 
have to be done on how well a device or procedure works in practice and 
how to improve its functioning in practice. 

According to the new Webster’s Dictionary of Synonyms, published this 
fall, the verb “implement” “has seen rapid development in its implica- 
tions . . . especially since early in the third decade of the twentieth cen- 
tury —which was about the time it greeted our ears on every hand in 
educational conventions and deliberative committee meetings. The word 
“usually” suggests reference to . proposals or projects which have 
been accepted, policies which have been adopted, and the like, and implies 
the performance of acts that definitely carry them into effect or ensure 
their being put into operation.” 


In a sense, every educational organization in the country is engaged in 
implementing its program, and so far as this deals with educational re- 
search, it is deserving of recognition here. We must therefore content our- 
selves by referring to a few outstanding examples of what is going on 
somewhat widely. The Implementation Commission of the National Asso- 
ciation of Secondary-School Principals has been at work for several years, 


stimulating publications which help translate general principles into terms 
of definite school action. The Discussion Group Project, begun in 1937, 
is another outcome of the work of this Commission. The Cooperative Study 
in General Education and the Commission on Teacher Education, both of 
the American Council on Education, and the Stanford Social Education 
Investigation are other examples of implementing work-—concerned with 
carrying significant facts and research findings to the cooperating institu- 
tions and developing applications. 

The Progressive Education Association Commission on the Secondary 
School Curriculum published, through various subjectmatter committees, 
a number of implementing treatises. Reference to one of these volumes 
will serve as an example. The treatise on science (65) not only details 
the general program but contains a chapter on “How the Teacher May 
Make Use of the Report” and gives several extended illustrations of 
science units. Reports of the National Committee on Science Teaching of 
the American Council of Science Teachers helped further to implement 
general policies in this same area. A recent series of publications by the 
U.S. Office of Education represent an implementation of our general policy 
of teaching democracy and the fundamental issues of the war in the public 
schools. The new series of pamphlets on problems in American life (54) 
is instructional material to implement a belief that contemporary prob- 
lems should be taught in the high school. The joint yearbook of the Ameri- 
can Educational Research Association and the Department of Classroom 
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Teachers (3) is an example of attempting to implement all educational 
research which bears directly on teaching. Further details and other exam- 
ples of implementation will be found in the general summaries of research 
activities by Carr (20), Good (32, 33), and Scates (76), cited earlier. 
and in Good’s section, “Research News and Communications,” appearing 
monthly in the Journal of Educational Research. A committee of the Amer- 
ican Council on Education outlined implementing practices and principles 


(2a, 38a). 


Summary Statement 


A number of writers have commented on the general relation of research 
and classroom practice (3, 13, 14, 15, 25, 38, 53, 82, 85). Some are 
impatient that research findings are not more quickly and more generally 
introduced into the classroom. Some think the difficulty lies in inadequate 
and improper reporting of research. The basic attitude seems to be, “What 
is worth knowing is worth putting into practice.” The difficulty is that we 
do not know something as quickly as we are likely to think we do. What 
the teacher or administrator may regard as proved beyond question, the 
more cautious research worker may regard as only in the beginning stages 
of exploration. It is certain that we shall do both research and educational 
practice an injury if we take the attitude that the findings of every study 
should at once be made available for putting into practice. Much of our 
research is for science rather than for practice, and all research needs 
mellowing. The best course for getting research into practice is not through 
more widespread and direct reporting of findings to teachers and adminis- 
trators but rather thru more deliberative and implementing committees 
which will carefully weigh results in a broad field of study, synthesize the 
findings into consistent theories, express principles in concrete terms, adapt 
generalizations to local conditions, try out recommendations, and then 
offer conclusions for general action by workers in the schools. Local com- 
mittees of teachers and administrators who have some research perspective 
and who understand that the applications of “raw” findings involve fur- 
ther study and research, can often help in the process, especially when 
they work in cooperation with a university professor who understands 
both research and teaching. Such procedures take time, yes, but they 
represent the only safe course for both the schools and research. 
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